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Summary 



Simulation of materials at the atomistic level is an important tool in study- 
ing microscopic structure and processes. The atomic interactions necessary 
for the simulation are correctly described by Quantum Mechanics. How- 
ever, the computational resources required to solve the quantum mechanical 
equations limits the use of Quantum Mechanics at most to a few hundreds 
of atoms and only to a small fraction of the available configurational space. 
This thesis presents the results of my research on the development of a 
new interatomic potential generation scheme, which we refer to as Gaus- 
sian Approximation Potentials. In our framework, the quantum mechani- 
cal potential energy surface is interpolated between a set of predetermined 
values at different points in atomic configurational space by a non-linear, 
non-parametric regression method, the Gaussian Process. To perform the 
fitting, we represent the atomic environments by the bispectrum, which is 
invariant to permutations of the atoms in the neighbourhood and to global 
rotations. The result is a general scheme, that allows one to generate inter- 
atomic potentials based on arbitrary quantum mechanical data. We built 
a series of Gaussian Approximation Potentials using data obtained from 
Density Functional Theory and tested the capabilities of the method. We 
showed that our models reproduce the quantum mechanical potential en- 
ergy surface remarkably well for the group IV semiconductors, iron and 
gallium nitride. Our potentials, while maintaining quantum mechanical ac- 
curacy, are several orders of magnitude faster than Quantum Mechanical 
methods. 
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1 Introduction 



Understanding the behaviour of materials at the atomic scale is fundamental 
to modern science and technology. As many properties and phenomena are 
ultimately controlled by the details of the atomic interactions, simulations 
of atomic systems provide useful information, which is often not accessible 
by experiment alone. Observing materials on a microscopic level can help 
to interpret physical phenomena and to predict the properties of previously 
unknown molecules and materials. To perform such atomistic simulations, 
we have to use models to describe the atomic interactions, whose accuracy 
has to be validated in order to ensure that the simulations are realistic. 

Quantum Mechanics provides a description of matter, which, accord- 
ing to our current knowledge, is ultimately correct, a conclusion which is 
strongly corroborated by experimental evidence. However, the solution of 
the Schrodinger equation — apart from a few very simple examples — has to 
be performed numerically using computers. A series of approximations and 
sophisticated numerical techniques has led to various implementations of 
the originally exact quantum mechanical theory, which can be now routinely 
used in studies of atomic systems. In the last few decades, as computational 
speed capacities grew exponentially, the description of more and more atoms 
has become tractable. In most practical applications, the electrons and the 
nuclei are treated separately, and the quantum mechanical description of 
the nuclei is dropped altogether. This simplification, namely, that the nu- 
clei move on a potential energy surface determined by the interaction of the 
electrons, already makes quantum mechanical calculations several order of 
magnitudes faster. However, determining macroscopic thermodynamical 
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CHAPTER 1. INTRODUCTION 



quantities of atomic systems requires a large number of samples of different 
arrangements of atoms, and the number of atoms has to be large enough 
to minimise finite-size effects. In fact, the computational costs associated 
with the solution of the Schrodinger equation are so large that the use of 
Quantum Mechanics is limited at most to a hundred of atoms and only a 
small fraction of the available configurational space. 

The demand for faster calculations to allow calculations of larger systems 
or the exploration of configurational space leads to the realm of analytical 
potentials, which are based on substituting the solution of the electronic 
Schrodinger equation with evaluation using an analytic function. Whereas 
the quantum mechanical description does not need validation — apart from 
ensuring that the errors introduced by the approximations are minimised — , 
analytic potentials have to be checked to determine whether the description 
remains valid. This is often done by comparing macroscopic quantities 
computed by the model to experimental values. There is a high degree 
of arbitrariness in the creation and validation of such potentials, and in 
practice it is found that they are significantly less accurate than Quantum 
Mechanics. 

As quantum mechanical calculations are becoming more widely avail- 
able, we have access to a large number of microscopic observables. The 
approach we present in this thesis is to create interatomic potentials based 
directly on quantum mechanical data which are fast and have an accuracy 
close to the original method. To achieve this, we have used a Gaussian Pro- 
cess to interpolate the quantum mechanical potential energy surface. The 
Gaussian Process is routinely used by the machine-learning community for 
regression, but it has never previously been adapted to represent the atomic 
potential energy surface. 

We describe the environment of the atoms by a vector, called the bis- 
pectrum, which is invariant to rotations, translations and permutation of 
atoms in the neighbourhood. Within the bispectrum representation, we 
regard the potential energy surface as a sum of atomic energy functions, 
whose variables are the elements of the bispectrum. Our approach for gen- 
erating interatomic potentials, which we collectively refer to as Gaussian 
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Approximation Potentials, has the favourable scaling and speed of analytic 
potentials, while the accuracy is comparable with the underlying quantum 
mechanical method. With Gaussian Approximation Potentials atomistic 
simulations can be taken to an entirely new level. 

1.1 Outline of the thesis 

The thesis is organised as follows. In chapter [2] I discuss the representation 
of atomic environments by the bispectrum. I show how the rotational in- 
variance of the bispectrum can be proved using Representation Theory and 
how the bispectrum is related to the widely used bond-order parameters. I 
summarise the Gaussian Process non-linear regression method we used in 
chapter |3} where I show the derivation of the formulae based on the Bayes' 
Theorem and the extensions which allowed us to use Gaussian Process for 
the regression of atomic potential energy surfaces. I describe a number of 
interatomic potentials and the Gaussian Approximation Potential in de- 
tail in chapter |4j Details of the computational methods, which we used to 
test our model, are given in chapter |5j Finally, I present our results on 
generating Gaussian Approximation Potentials for several systems and the 
validation of the models in chapter [6] 



2 Representation of atomic 
environments 

2.1 Introduction 

The quantitative representation of atomic environments is an important tool 
in modern computational chemistry and condensed matter physics. For ex- 
ample, in structure search applications [1], each configuration that is found 
during the procedure depends numerically on the precise initial conditions 
and the path of the search, so it is important to be able to identify equivalent 
structures or detect similarities. In other applications, such as molecular 
dynamics simulation of phase transitions [2], one needs good order param- 
eters capable of detecting changes in the local order around the atoms. In 
constructing interatomic potentials [3], the functional forms depend on ele- 
ments of a carefully chosen representation of atomic neighbourhoods, e.g. 
bond lengths, bond angles, etc. 

Although the Cartesian coordinate system provides a simple and un- 
equivocal description of atomic systems, comparisons of structures based 
on it are difficult: the list of coordinates can be ordered arbitrarily, or two 
structures might be mapped to each other by a rotation, refiection or trans- 
lation. Hence, two different lists of atomic coordinates can in fact represent 
the same or very similar structures. In a good representation, permuta- 
tional, rotational and translational symmetries are built in explicitly, i.e. 
the representation is invariant with respect to these symmetries, while re- 
taining the faithfulness of the Cartesian coordinates. If a representation is 
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complete, a one-to-one mapping is obtained between the genuinely different 
atomic environments and the set of invariants comprising the representa- 
tion. 

The most well known invariants describing atomic neighbourhoods are 
the set of bond-order parameters proposed by Steinhardt et al.[4j. These 
have been successfully used as order parameters in studies of nucleation[5j, 
phase transitions [1] and glasses[7]. In the following sections we show that 
the bond-order parameters actually form a subset of a more general set of 
invariants called the bispectrum. We prove that the bispectrum compo- 
nents indeed form a rotational and permutational invariant representation 
of atomic environments. The formally infinite array of bispectral invariants 
provide an almost complete set, and by truncating it one obtains represen- 
tations whose sensitivity can be refined at will. 

2.2 Translational invariants 

The concept of the power spectrum and the bispectrum was originally in- 
troduced by the signal processing community. In the analysis of periodic 
signals the absolute phase is often irrelevant and a hindering factor, for 
example, when comparing signals. The problem of eliminating the phase of 
a periodic function is very similar to the problem of creating a rotationally 
invariant representation of spatial functions. We show how the bispectrum 
of periodic functions can be defined and discuss its possible uses in atomistic 
simulations. 

2.2.1 Spectra of signals 

A periodic signal f{t) (or a function defined on the circumference of a circle) 
where t G [0,27r), can be represented by its Fourier series: 




(2.1) 
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Table 2.1: Fourier and power spectrum coefficients of /i and /2. 



where the coefficients, /„, can be obtained as follows: 



fn = ^ I fit) exp{-iunt)dt. (2.2) 

Jo 

A phase shift of the signal (or rotation of the function) by to transforms the 
original signal according to 

/(t)->/(t + to), (2.3) 

and the coefficients become 

fn exp(zw„to)/n- (2.4) 
It follows that the power spectrum of the signal defined as 

Pn = rjn (2.5) 

is invariant to such phase shifts: 

Pn = fnfn (/„ exp(iWnto))* (/„ exp(iia;„to)) = /n/n, (2.6) 



but the information content of different channels becomes decoupled. Figure 2.1 
and table |2.1 demonstrate two functions, /i = sin(t) + sin(2t) and /2 = 



sin(t) + cos(2t), that can both be represented by the same power spectrum. 



2.2.2 Bispectrum 

As the power spectrum is not complete, i.e. the original function cannot 
be reconstructed from it, there is a need for an invariant representation 
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Figure 2.1: Two different periodic functions that share the same power 
spectrum coefficients. 

from which the original function can (at least in theory) be restored. The 
bispectrum contains the relative phase of the different channels, moreover, 
it has been proven to be complete[8]. 

A periodic function / : — )■ C, whose period is Lj in the i-th direction, 
can be expressed in terms of a Fourier series: 

/(r) = 5^/(a;)exp(^a;r), (2.7) 

where the Fourier-components can be obtained from 

^('^) = IIt. f ^^""^ exp(za;r)dr (2.8) 

and oj = {ui,U2, ■ ■ ■ , Wn)- An arbitrary translation T(ro) transforms / as 
/(r) — )• /(r— fq), thus the Fourier-coefficients change as /(w) — )■ exp(— iujro)/( 
The bispectrum of / is defined as the triple-correlation of the Fourier coef- 
ficients: 

6(a;i,a;2) = f{u;,)f{u;2)fiu;i + 0^2)*. (2.9) 
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The bispectrum is invariant to translations: 



b{LJi, LJ2) fi^^i) exp(i - LL>iro)f{LJ2) exp(i - ^2^0) 
X + LJ2y exp {i{u:i + a;2)ro) 



b{u}^,u}2). (2.10) 



The bispectrum has been shown to be complete|8]. The proof, which 
is highly technical and would be too long to reproduce here is based on 
Group Theory. Further, Dianat and Raghuveer proved that in case of one- 
and two-dimensional functions the original function can be restored using 
only the diagonal elements of the bispectrum, i.e. only the components for 
which uji = UJ2^- 

2.2.3 Bispectrum of crystals 

Crystals are periodic repetitions of a unit cell in space in each of the three 
directions defined by the lattice vectors. A unit cell can be described as a 
parallelepiped (the description used by the conventional Bravais system of 
lattices) containing some number of atoms at given positions. The three 
independent edges of the parallelepiped are the lattice vectors, whereas the 
positions of the atoms in the unit cell form the basis. Defining crystals 
in this way is not unique, as any subset of a crystal which generates it by 
translations can be defined as a unit cell, for example, a Wigner-Seitz cell, 
which is not even necessarily a parallelepiped. 

Thus a crystal can be described by the coordinates of the basis atoms 
Fj, where i = 1, . . . ,N and the three lattice vectors a^,, a = 1,2, 3. The 
position of the basis can be given in terms of the fractional coordinates Xj, 
such that 



where < Xia < 1. 

In the same way as in the case of atomic environments, the order of the 
atoms in the basis is arbitrary. We introduce the permutational invariance 
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through the atomic density: 

p(x) = 5^5(x-x,). (2.12) 

i 

p is a periodic function in the unit cube, therefore we can expand it in a 
Fourier series and calculate invariant features such as the power spectrum 
and bispectrum. It can be noted that the power spectrum of p is equivalent 
to the structure factor used in X-ray and neutron diffraction, and it is clear 



from Section 2.2.1 why the structure factor is not sufficient to determine 
the exact structure of a crystal. In contrast, the bispectrum of the atomic 
density function could be used as a unique fingerprint of the crystal that is 
invariant to the permutation and translation of the basis. 

We note that permuting the lattice vectors of the crystal permutes the 
reciprocal lattice vectors which therefore, mixes the elements of the bispec- 
trum. This problem can be eliminated by first matching the lattice vectors 
of the two structures which are being compared. The rotation of the entire 
lattice does not change the fractional coordinates, hence the bispectrum is 
invariant to global rotations. 



2.3 Rotationally invariant features 

Invariant features of atomic environments can be constructed by several 
methods, of which we list a few here . In interatomic potentials, a set 
of geometric parameters are used, such as bond lengths, bond angles and 
tetrahedral angles. These are rotationally invariant by construction, but 
the size of a complete set of such parameters grows as exp(A^), where is 
the number of neighbours. The complete set is vastly redundant, but there 
is no systematic way of reducing the number of parameters without losing 
completeness. 

A more compact rotationally invariant representation of the atomic en- 
vironment can be built in the form of a matrix by using the bond vectors rj, 
i = 1, . . . , N between the central atom and its neighbours. The elements 
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of the matrix are given by the dot product 

Mij = ri-rj. (2.13) 

Matrix M contains the bond lengths on its diagonal, whereas the off- 
diagonal elements are related to the bond angles. It can be shown that 
M is a complete representation [TO] . However, permuting the neighbour- 
ing atoms shuffles the columns and rows of M, thus M is not a suitable 
invariant representation. 

Permutational invariance can be achieved by using the symmetric polynomials jllj. 
These are defined by 

Ilk{xi,X2, . . .,xn) = Ilk{xj,^,Xn2, ■ ■ • ,3;^jv) (2.14) 

for every tt, where tt is an arbitrary permutation of the vector (1,2,..., N). 
The first three symmetric polynomials are 

N 

Ui{xi,X2, . . . ,xn) = ^Xi (2.15) 

i 

N 

n2(Xi,X2, . . . ,XAr) = ^XjXj (2.16) 

i<j 
N 

Il3{xi,X2, . . . ,Xn) = ^ XiXjXk. (2.17) 
i<j<k 

The series of polynomials form a complete representation, however, this set 
is not rotationally invariant. 

2.3.1 Bond-order parameters 

As a first step to derive a more general invariant representation of atomic 
environments, we define the local atomic density as 

p.(r) = ^(5(r-r,,), (2.18) 
j 

where the index j runs over the neighbours of atom i. The local atomic 
density is already invariant to permuting neighbours, as changing the order 
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of the atoms in the neighbour hst only affects the order of the summation. 
This function could be expanded in terms of spherical harmonics (dropping 
the atomic index i for clarity): 

I 

1=0 m=-l 

However, we should note that this representation does not contain informa- 
tion about the distances of neighbours. In fact, p(r) represented this way is 
the projection of the positions of neighbouring atoms onto the unit sphere. 
The properties of functions defined on the unit sphere are described by the 
group theory of SO (3), the group of rotations about the origin. 

The spherical harmonics functions form an orthonormal basis set for L2: 

(^ml^i'm') = ^ll'^mm', (2.20) 

where the inner product of functions / and g is defined as 

{f\g) = J nr)g{r)dr. (2.21) 
The coefficients cim can be determined as 

Clm = {p\Ylm) = J2 ^™ (^(^^j)' '^(^^j)) • (2.22) 

j 

We note that the order parameters Qim introduced by Steinhardt et 
al[l] are proportional to the coefficients q^- In their work, they defined the 
bonds in the system as vectors joining neighbouring atoms. Defining which 
atoms are the neighbours of a particular atom can be done by using a simple 
distance cutoff or via the Voronoi analysis. Once the set of neighbours 
has been defined, each bond connecting neighbour atoms i and j is 
represented by a set of spherical harmonics coefficients 

l/™(r^,) = l«™(^(r.,),0(r,,)). (2.23) 

Averaging the coefficients for atom i provides the atomic order parameters 
for that atom 




(2.24) 
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where iVj is the number of neighbours of atom i. Similarly, averaging over 
all bonds in the system gives a set of global order parameters 

Qlm = ^m(rjj)5 (2.25) 

where Nh is the total number of bonds. Both of these order parameters are 
invariant to permutations of atoms and to translations, but they still depend 
on the orientation of the reference frame. However, rotationally invariant 
combinations of these order parameters can be constructed as follows 

1/2 



^ E (Qim)*QL and (2.26) 

m=—l / 

I 



Wi= J2 i ^ ^ ^ ]QlmMm,QU (2-27) 

, \ mi mo m^ ' 

mi,r7i2,m3=— J \ ^ ^ ^ 



for atoms and 



I \ 1/2 

4:71 



= ^ E QlrnQlm (2.28) 



m= 

I 



Wi= V M ^ ^ \Ql^,Qlm,Qlm, (2.29) 

, \ m^ mo m-j / 

mi,m2,m3=— J \ ^ ^ / 

for global structures. The factor in parentheses is the Wigner-3jm symbol, 
which is nonzero only for mi + m2 + ms = 0. 

Q\ and Wl are called second-order and third-order bond-order param- 
eters, respectively. It is possible to normalise such that it does not 
depend strongly on the number of neighbours as follows: 

/I \3/2 

Wl = wii E iQlmTQln. ■ (2.30) 

\rn=-l J 

Bond-order parameters were originally introduced by Steinhardt et al|l] 
for studying the order in liquids and glasses, but their approach was adopted 
soon for a wide range of applications. For example, the bond-order param- 
eters, when averaged over all bonds in the system, can be used as reaction 
coordinates in phase transitions [12j. 
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For symmetry reasons, bond order parameters with / > 4 have non- 
zero values in clusters with cubic symmetry and / > 6 for clusters with 
icosahedral symmetry. The most widely calculated bond order parameters 
are / = 4 and 1 = 6. Different values correspond to crystalline materials 
with different symmetry, while the global values vanish in disordered phases, 
such as in liquids. This feature made the Q and W invariants attractive for 
use as bond order parameters in many applications. 

2.3.2 Power spectrum 

Using some basic concepts from representation theory, we can now prove 
that the second-order invariants are rotationally invariant, then we show 
a more general form of invariants, a superset consisting of third-order 
invariants[13J. An arbitrary rotation R operating on a spherical harmonic 
function Yim transforms it into a linear combination of spherical harmonics 
with the same I index: 

I 

m^= J2 D^m^'imn.', (2.31) 

m'=—l 

where the matrices D'^')(i?) are also known as the Wigner-matrices. The 
elements of the Wigner matrices can be generated by 

^w(^) = {Yimim^'). (2.32) 
It follows that the rotation operator R acts on the function p as 
I I 

Rp= R^^ ^ ClmYlm = XI XI (^IrnRyim 
1=0 m=—l 1=0 m=—l 

I I 

= E E E cimDi:!^,{R)Y,^> = 

1=0 m=—l m'=—l 

I 

= E E (2.33) 

1=0 m'=-l 

thus the vector of coefficients transform under rotation as 

Q^D«(i?)Q. (2.34) 
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Making use of the fact that rotations are unitary operations, it is possible 
to show that the matrices D*^'^ are unitary, i.e. 

(D«)^D« = I, (2.35) 

leading us to a set of rotationally invariant coefficients, the rotational power 
spectrum: 

Pi = clci. (2.36) 
The coefficients of the power spectrum remain invariant under rotations: 

Pi = cjc, ^ (cj (D«)^) (D«c,) = cjc, (2.37) 

It can be directly seen that the second-order bond-order parameters are 
related to the power spectrum via the simple equation 



Q--[^,P') ■ (238) 

The power spectrum is a very impoverished representation of the original 
function p, because all pi coefficients are rotationally invariant indepen- 
dently, i.e. different / channels are decoupled. This representation, although 
rotationally invariant, is, in turn, severely incomplete. 

The incompleteness of the power spectrum can be demonstrated by the 
following example. Assuming a function / in the form 

h h 
/(f) = + Yl l^rnYi.mi^), (2-39) 

m=—li m=~l2 

its power spectrum elements are pi^ = lap and pi-^ = |/3p. Thus only the 
length of the vectors ck and f3 are constrained by the power spectrum, their 
relative orientation is lost, i.e. the information content of channels li and 



I2 becomes decoupled. Figure |2.2| shows two different angular functions, 
fi = Y22 + Y2-2 + 133 + r3_3 and /2 = Y21 + Y2-1 + Y32 + F3-2 that have the 
same power spectrum p2 = 2 and ps = 2. 
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Figure 2.2: Two different angular functions that share the same power 
spectrum coefficients. 



2.3.3 Bispectrum 

We will now generalise the concept of the power spectrum in order to obtain 
a more complete set of invariants via the coupling of the different angular 
momentum channels [T3] . Let us consider the direct product C;^ ® C;^, which 
transforms under a rotation as 



(2.40) 



It follows from the representation theory of groups that the direct product 
of two irreducible representations can be decomposed into direct sum of 
irreducible representations of the same group. In case of the S0(3) group, 
the direct product of two Wigner-matrices can be decomposed into a direct 
sum of Wigner-matrices in the form 

h+h 

j^ih) ^j^ih) ^ ^Qiuhy nil) r^iuh 



D(') 

l=\h-l2\ 



(2.41) 



where C'^''^ denote the Clebsch-Gordan coefficients. The matrices of Clebsch- 
Gordan coefficients are themselves unitary, hence the vector C'^''^ (c;^ ci^) 
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transforms as 
We define gi.^i^^i as 



i=|h-/2| 



C'^''^ {ci, (g) . (2.42) 



h+h 



= C'l''^ (q, c^J , 

/=Kl-i2| 



(2.43) 



i.e. the gii^i2,i is that part of the RHS which transforms under rotation as 

ShM ^ ^^'^ShM- (2-44) 

Analogously to the power spectrum, the bispectrum components or cu- 
bic invariants, can be written as 



which are invariant to rotations: 



(2.45) 



(2.46) 



Kondor showed that the bispectrum of the SO (3) space is not complete, 
i.e. the bispectrum docs not determine uniquely the original function. This 
is a deficiency due to the fact that the unit sphere, 5*2 is a homogeneous 
space. However, he states that the bispectrum is still a remarkably rich 
invariant representation of the function. 

Rewriting the bispectrum formula as 

I h h 

bh,l2,l = (^Im^l^ihmQ^hmiCl^mi, (2.47) 

m=—l mi=— /i m2=—l2 

the similarity to the third-order bond-order parameters becomes apparent. 
Indeed, the Wigner 3jm-symbols are related to the Clebsch-Gordan coeffi- 
cients through 



li I2 I3 
mi 1712 ms 



^/2l^ /imii2m2- 



(2.48) 
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For the spherical harmonics = (— thus the third-order pa- 
rameters Wi are simply the diagonal elements of the bispectrum bi^^i up to 
a scalar factor, and thus, the bispectrum is a superset of the third-order 
bond-order parameters. Further, considering that Yqq = 1, therefore the co- 
efficient Coo is simply the number of neighbours A^, and C^°om2 ~ ^i,h^m,m2y 
we notice that the bispectrum elements h = 0, I = I2 are the power spec- 
trum components, previously introduced: 

I I I 

bl,0,l = NiY^ C*i^Sm,ra2Clm2 =^1^ ^mClm = ^iPi- (2.49) 

m=—lTn2=—l m=—l 

Finally, the relationship between the bond-order parameters and the 
bispectrum can be summarised as 

VK^i (2-50) 
Wi oc bi,i,i. (2.51) 

2.3.3.1 Radial dependence 

The bispectrum is still a very incomplete representation, as it uses the 
unit-sphere projection of the atomic environment, i.e. the distance of 
the atoms from the centre is not represented. One way to improve this 
shortcoming — namely, the lack of radial information — is to introduce radial 
basis functions [14j, completing the basis for three-dimensional space. In 
equation |2.19[ we use the product of spherical harmonics and a linearly 
independent set of radial functions gn- 

I 

Z^^^) = Z E E (^nimgn{r)Yi^ (^(r), 0(r)) . (2.52) 

n 1=0 m=—l 

If the set of radial basis functions is not orthonormal, i.e. {gn\gm) = Snm 7^ 
5nm-i after obtaining the coefficients c'„;^ with 

4m = {9nYim\p), (2.53) 
the elements Cnim are given as 

Cnlm = Y.i^'')n'n^n'lm- (2-54) 
n' 
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Figure 2.3: Two possible sets of radial basis functions, Gaussians centred 
at different radii. The narrow Gaussians are more sensitive to changes in 
radial positions, but the coupling between them is weaker. 



In practice, when constructing the invariants, both c'^i^ and Cnim can be 
used. 

Rotational invariance only applies globally, therefore the different an- 
gular momentum channels corresponding to various radial basis functions 



need to be coupled. Simply extending equation 2.47 to the form 



I h h 

bn,li,l2,l = ^nlm^hmil2m2'^nlimiCnl2m2y (2.55) 

m=—l mi=—li m2=—l2 

provides a set of invariants describing the three-dimensional neighbourhood 
of the atom. In fact, this formula can easily lead to a poor representation, 
if the radial functions have little overlap with each other, as the coefficients 
belonging to different n channels become decoupled. To avoid this, it is 
necessary to choose wide, overlapping radial functions, although this greatly 
reduces the sensitivity of each channel. The fine-tuning of the basis set is 
rather arbitrary, and there does not necessarily exist an optimum for all 
systems. An alternative way to construct invariants from c is to couple 
different radial channels, for example, as 

I h h 

bni,n2,li,l2,l ~ ^ ^ ^ ] ^ ^ ^nilm^limil2m2^"-2hmiCn2l2m2- (2.56) 
m=—l m\=—l\ m2=—l2 

Now we ensure that radial channels cannot become decoupled, but at the 
price of increasing the number of invariants quadratically. Although adding 
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Figure 2.4: Projection of a line to a circle (left panel), 
dimensional plane onto the three-dimensional sphere 



projection the two- 
(right panel). The 



projection we use is in equation 2.57 the generalisation to one more dimen- 
sion. 



a suitable set of radial functions allows one to construct a complete rep- 
resentation, we found this approach overly complicated. A high degree of 
arbitrariness is introduced by having to choose a radial basis. 

2.3.4 4-dimensional bispectrum 

Instead of using a rather arbitrary radial basis set, we propose a general- 
isation of the power spectrum and bispectrum that does not require the 
explicit introduction of a radial basis set, yet still forms a complete basis of 
three-dimensional space. We start by projecting the atomic neighbourhood 
density onto the surface of the four-dimensional unit sphere, in a similar 
fashion to the Riemann-construction: 



y 



(j) = arctan(?//x) 
^ e = arccos(2/|r|) , (2.57) 
^0 = |r|/ro 



where ro > rcut/^r. Using this projection, rotations in the three-dimensional 



space correspond to rotations in the four-dimensional space. Figure 2A_ 
shows such projections for 1 and 2 dimensions, which can be more easily 
drawn than the three-dimensional case that we use here. 

An arbitrary function p defined on the surface of a 4D sphere can 
be numerically represented using the hyperspherical harmonics functions 
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oo j 

P=J2 H ^m'mVlm- (2-58) 
j=0 m,m'=—j 

The hyperspherical harmonics form an orthonormal basis set, thus the ex- 
pansion coefficients c^/^ can be calculated via 

<'^'m = {K'Jp)^ (2.59) 

where (.|.) denotes the inner product in 4-dimensional space. Although 
the coefficients c'^,^ have two indices for each j, they are vectors and, for 
clarity, we denote them as cK Similarly to the three-dimensional case, a uni- 
tary operation i?, such as a rotation, acts on the hyperspherical harmonics 
functions as 

RU^ , W , , W , , (2.60) 

m'2m2 

where the matrix elements R'' , , are given by 

i^^ , =(U', \R\U', ). (2.61) 

Hence the rotation R acting on p transforms the coefficient vectors o? ac- 
cording to 

Q? ^ R^c^'. (2.62) 

R.^ are unitary matrices, i.e. (R-')^ R-' = I. 

The product of two hyperspherical harmonics functions can be expressed 
as the linear combination of hyperspherical harmonics [15j: 

rr'i llh _ plm plm ttI /<-) 

^m.[mi^m'2m2 ~ 2-^ '^Iimil2m2'^hmil2m2^ m'my {^.VO) 

l=\h-l2\ 

where Ci"L , ^ are the well-known Clebsch-Gordan coefficients. We can 



recognise in equation 2.63 the four dimensional analogues of the Clebsch- 
Gordan expansion coefficients, defined as Hl'^^'^,^ ^^^^^,^ = Cl^„^,i2m2Cl%[i2mi 
Using the matrix notation of the expansion coefficients, it can be shown that 
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the direct product of the four-dimensional rotation matrices decompose ac- 
cording to 



(2.64) 



J=\jl-j2\ 

The remainder of the derivation continues analogously to the 3D case. Fi- 
nally, we arrive at the expression for the bispectrum elements, given by 

jl 32 j 

B = \^ \^ \^ (r' YC^^ C^^' r'^ r'^ 

31,32,3 / J / J / J y^m'm) ^ j\rnxj2rn2 jim'^j2rn'^ m'^mi rn'^m.2' 

m'j^,mi=—ji m'2,m2=—j2 m' ,m=—j 

(2.65) 

Note that the 4D power spectrum can be constructed as 

P3= E iin'JC^'m- (2.66) 

m' ,m=—j 

The 4D bispectrum is invariant with respect to rotations of four- dimensional 
space, which include three-dimensional rotations. However, there are ad- 
ditional rotations, associated with the third polar angle 60, which, in our 
case, represents the radial information. In order to eliminate the invariance 
with respect to the third polar angle, we modified the atomic density as 
follows 

p,(r) = 5(0) + 5^(5(r-r,,), (2.67) 

3 

i.e. by adding the central atom as a reference point. 

The magnitude of the elements of the bispectrum scale as the cube of 
the number of neighbours, so we take the cube-root of the coefficients in 
order to make the comparison of different spectra easier. 

2.3.5 Results 

In practice, the infinite spherical harmonic expansion of the atomic neigh- 
bourhood is truncated to obtain a finite array of bispectral invariants. In 



Figure 2.5 we show the 4D bispectra of atoms in a variety of environments. 



truncated to j < 4, which gives 42 bispectrum coefficients. In each case 
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Figure 2.5: 4D bispectra of atoms in various structures: a) fcc/hcp/bcc 
lattices with a first neighbour cutoff; b) fcc/hcp/bcc lattices with a second 
neighbour cutoff; c) hexagonal and cubic diamond lattice; d) expansion of a 
diamond lattice; e) bulk diamond, (111) surface of diamond and graphene; 
f) fee vacancy; g) the A and B atoms in a zincblende structure, compared 
with diamond. 



It can be seen from figure 2^ that the bispectrum is capable of dis- 
tinguishing very subtle differences in atomic neighbourhood environments. 
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Some points of particular interest are the following. The difference between 
the face-centred cubic (fee) and the hexagonal close-packed (hep) structures 
is very small within the first neighbour shell, as is the difference between the 
corresponding bispectra (panel a). However, the difference is much more 
pronounced once second neighbours are included (panel b). The difference 
between the cubic and hexagonal diamond lattices is the stacking order 
of the (111) sheets. The positions of the four nearest neighbours and nine 
atoms of the second-nearest neighbour shell are the same and, only the posi- 



tions of the remaining three neighbours are different, as shown in figure 2.6 



The curves in figure 2.5 3 reflect the similarity of these two structures: most 



Figure 2.6: Cubic and hexagonal diamond. Cubic diamond is shown in the 
left panel. 

of the bispectrum coefficients are equal, except a few, which can be used 
for distinguishing the structures. Figure [23 i shows the bispectra of three 



atoms in perfect diamond lattices, which differ in the lattice constants. This 
plot illustrates the sensitivity of the bispectrum in the radial dimension be- 
cause the expansion of a lattice leaves all angular coordinates the same. 
It can be seen that the first element of the bispectrum array remains the 
same, because this is proportional only to the number of neighbours. 

We performed the principle component analysis [16] on the bispectra of 
atoms in a slab of silicon. On the surface of the slab, the atoms were ar- 
ranged according to the 7x7 reconstruction [T7j. The position of the atoms 
were randomised by 0.3 A. We projected the 42- dimensional space of the 
bispectrum — which corresponds to j < 4 — to the two-dimensional plane 
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Figure 2.7: Principle component analysis of the bispectrum of atoms on 
the 7x7 reconstruction of the (111) surface of silicon. 



and clustered the points using the k-means algorithm |1 8]. In figure 2.7 



we 



show the result of the principle component analysis. Different colours are 
assigned to each cluster identified by the k-means method, and we coloured 
the atoms with respect to the cluster they belong. This example demon- 
strates that the bispectrum can be used to identify atomic environments in 
an automatic way. 

It is straightforward to describe multi-species atomic environments us- 
ing the bispectrum. We modify the atomic density function defined in 
equation |2.18| as 

p,(r) = s,6{0) + J2 ^i<^(r - r^,), (2.68) 
j 

where s contains an arbitrary set of coefficients, different for each species. 



which are thus distinguished. Figure |2.5^ shows the resulting bispectra for 
the two different atoms in the zincblende lattice, as well as the diamond 
lattice for comparison. It can be seen that the bispectrum successfully 
distinguishes between the different species. 



3 Gaussian Process 



3.1 Introduction 

Regression methods are important tools in data analysis. Parametric mod- 
els can be expressed in functional forms that contain free parameters that 
are fitted such that the models reproduce observations. The model can 
often be formulated in a way that the functional form is a linear combina- 
tion of the parameters. The fitting procedure in such cases is called linear 
regression. Non-linear regression is needed if the functional form cannot be 
expressed as a simple linear combination of the parameters, but this case 
does not differ conceptually from the linear case. However, there is often 
no theory or model describing a particular process — or it is just too com- 
plicated to write the model in a closed functional form — , but it is still im- 
portant to make predictions of the outcome of the process. Non-parametric 
approaches, such as neural networks or Gaussian Processes, can be used 
to approximate the underlying function given a set of previously collected 
data. As neural network methods form a subset of Gaussian Processes |19]. 
we decided to use the latter approach in our work. 

3.2 Function inference 

Gaussian Processes predict the values of a function whose form is not ex- 
plicitly known by using function observations as evidence. If t = 
are values of a function / : — t- M measured at the points X = {xj},^]^ 
with some error, predicting the value t^+i at Xjv+i can be formulated as a 
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Bayesian inference problem. Bayes' theorem states that 

P{tN^i\t) - oc P{t\t^^,)P{t^^,), (3.1) 

where P{tN+i) is a Gaussian prior on the function space. It is possible to 
introduce a Gaussian prior on function / as 

/(x) = ^^/;,0,(x) (3.2) 

h 

where {(l>h}h=i form a complete basis set and the distribution of w is a 
Gaussian with zero mean and variance ah- Wh ~ N{0,ah)- Each function 
value fn is a linear combination of the basis functions: 

H H 

fn = ^ Wh<Ph{^n) = ^ WhRnh, (3-3) 

h=l h=l 

where Rnh = <Ph{^n)- The covariance matrix of the function values f is the 
matrix of expectation values 

Q = (f F) = (Rww^R^) = R(ww^)R^ = a^RR^ (3.4) 

Thus the prior distribution of f is A^(0, Q) = A^(0, a^RR^). However, 
each measurement contains noise, which we assume to be Gaussian with 
zero mean and variance a^. The vector of data points also has Gaussian 
distribution: P(t) ~ iV(0, Q + cr^). We denote the covariance matrix of t 
byC^Q + a^I. 

The distribution of the joint probability of observing ^at+i having pre- 
viously observed t can be written as 

P{tM+l\t)^P{[ttN+l\), (3.5) 

where P{\ttN+i\) ~ -^(0, Cat+i), or exphcitly 

P{[Un+i]) oc exp {^-]^[UN+iYC],\^[Un+i]^ ■ (3.6) 
The covariance matrix Cjv+i and its inverse can be written as 



■'N+l 



(3.7) 
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and 



M m 



The submatrices of Cj^l^_^ can be calculated via 



Cat+iCjv+i 

which leads to 



CAfM + km^ CArm + mk 
k^M + /tm^ k^m + Km 



m 
m 



(K-k^C^^k)-^ 



-mC^^k 
1 



M = C^^ H mm^ 

m 



Substituting these into equation 3.6 we obtain 



P(tAr+i|t) cx exp 



{tN+l — tN+l) 

ijv+i 



where the new variables t^+i and o"; are defined as 



(3.9) 



(3.10) 
(3.11) 

(3.12) 



(3.13) 



(3.14) 



and 



K 



(3.15) 



i.e. ^AT+i has Gaussian distribution with mean ^Ar+i and variance af . We 
use this formula to predict function values and error bars. 



N+l 



Figure 3J^ shows a one-dimensional example of the Gaussian Process 
regression. We sampled an arbitrary function at ten random points between 
the interval (^, |) and used these samples as the training points. We present 
the predicted values and the predicted errors in the entire interval (0, 1). It 
can be seen that inside the fitting region, the predicted values are very close 
to the original functions, and the predicted variance is also small. Outside 
the fitting region, the prediction is meaningless, and this is indicated by the 
large variance. 
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Figure 3.1: Gaussian Process regression in one dimension. The original 
function (dotted line) was sampled at ten random points (open squares). 
The predicted function values (solid line) and the errors (dashed line) are 
shown. 



3.2.1 Covariance functions 



The elements of the covariance matrix Q defined in equation 3.4 can be 
determined as 

Qnn' = 0"! X] RnhRn'h = (^h^ (p hM <P h{^n') ■ (3.16) 



In our work, we used Gaussians centred at different points as basis functions. 
In one dimension, these would have the form 



bh{x) = exp 



2r2 



(3.17) 



If the basis set consists of infinitely many basis functions which are dis- 



tributed uniformly, the summation in equation 3.16 can be replaced by an 
integration: 



Qnn' ^ 



/exp(- 



2^^2 



exp 



2r2 



dr. 



(3.18) 
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The integral of the product of two Gaussian is also a Gaussian, leading to 
the final expression — also known as the kernel — of the covariance matrix 
elements 



where 6 and 9 are usually referred to as hyperparameters. This finding 
demonstrates that the Gaussian Process method is, in fact, an example of 
non-parametric regression with infinitely many basis functions, but where it 
is not necessary to determine the coefficients of the basis functions explicitly. 
We note that using Gaussians as basis functions is a convenient choice, as 
the elements of the covariance matrix can be calculated analytically using a 
simple Gaussian kernel, but depending on the nature of the target function, 
there is a large variety of alternative basis functions and kernels. 

In the case of multidimensional input data, the Gaussian kernel could 
be modified such that different length scales are associated with different 
directions: 



where the vector = {6i}^i contains the typical decorrelation length of the 
function in each dimension i. If we assume that the initial Gaussian basis 
functions are not aligned in the directions of the original input vectors, the 
kernel can be written in the form 



where is the matrix of hyperparameters. 
3.2.2 Hyperparameters 

The choice of hyperparameters S, and ai, depends strongly on the dataset. 
represents the width of the basis functions, i.e. it characterises the typical 
length scale over which the function values become uncorrelated. 5 places 
a prior on the variance of the parameter vector w, describing the typical 
variance of the function, while ai, is the assumed noise in the measured 




(3.19) 




(3.20) 




(3.21) 
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data values. Ideally, a prediction for ^Ar+i would be made by evaluating the 
integral 

P(t;v+i|a;^+i,t,X) = y"p(t^+i|x^+i,t,X,h)P(h|t,X)dh, (3.22) 

but depending on the model, the analytic form of the integral may or may 
not be known. Although it is always possible to carry out the integra- 
tion numerically, for example, by Markov chain Monte Carlo or Nested 
Sampling[2n], a computationally less demanding method is to approximate 
the integral at the most probable value of h. It is often possible to choose 
good hyperparameters based on known features of the function, but the 
hyperparameters can also be optimised if needed. If we consider the prob- 
ability distribution of a hyperparameter set h given a dataset D: 

P{h\D) oc P{D\h)P{h), (3.23) 

optimal hyperparameters can be obtained by maximising this probability, 
known as the marginal likelihood. Assuming a uniform prior on the hy- 



perparameters and using the result found in equation 3.4, i.e. P(t|X) ~ 
A^(0, C), the logarithm of the likelihood is 

lnP(t|X,h) = -^t^C-H - ^IndetC - y ln27r. (3.24) 

Maximising the logarithm of the likelihood with respect to the hyperpa- 
rameters can be performed by gradient-based methods such as Conjugate 
Gradients[21j, where that gradients can be calculated as 

dhi 2 dhi 2 V 9hJ ^ ' 

3.2.3 Predicting derivatives and using derivative 
observations 

Predicting the values of derivatives using a Gaussian Process can be per- 



formed by simply differentiating the expectation value t in equation |3.14 
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The elements of k are given by the covariance function, hence we need to 
differentiate the covariance function, 

dkn dC{^n,^) 



dxi 



dxi 



which gives 



dkn 
dxi 



n 



(3.27) 



(3.28) 



in the case of Gaussian kernels. 

It is also possible that values of derivatives have been measured and these 



are also available. In order to use this data, we differentiate equation 3.2 

df ^ d(j)h 
= 2^Wh 



dxi 



dxi 



(3.29) 



thus we need to substitute ^ for the basis functions in equation 
give 



3.16 



to 



Qnn' — O'l RnhRn'h — O'l 



h 



h 



dXi 



Qnn' — CTh RnhRn'h — CTh 



dXi 



d(i)h 



(3.30) 



(3.31) 



For Gaussian kernels, the covariance between a derivative and a function 
value observation is 

(■^nfc ^n'k) 



Qr 



•^ni ■^n'i r2 



6 exp 



(3.32) 



or between two derivative observations the covariance is 



Qr 



1 1 Xfii ^n'i '^nj '^n'j 



^1 



5^ exp 



(•^nfe ■^n'k) 



01 



(3.33) 

Finally, if the function is a composite function of the form /(x) = /(y(x)) 
and the derivatives ^ are available, the Gaussian covariance function be- 
tween a derivative (ra-th) and function value (n'-th) observation is 

1 {Vnk — Vn'kY 



Qr 



EUnk — Vn'k dynk ^2 I 
^2 exp I -- 



01 



dxi 



(3.34) 
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and between two derivative observations 4^ and ^ is 

clxi axj 



Q-^' = I E ei^^ - ^-j ^ 1-2 >: ^ I ' (3-35) 



with 



Using the same model for observations of function values and their 
derivatives enables us to incorporate the available information into a single 
regression allowing us to infer both function values and derivatives. 

Since there is no reason to assume that the noise is the same in case 
of both the function value and derivative observations, we use two distinct 
noise hyperparameters. 



3.2.4 Linear combination of function values 

It is possible that linear combinations of function values can be observed 
during the data collection process: 

fm = -^mn/(Xn) = ''^^ LmnRnhWh- (3.37) 



If this is the case, equation 3.4 is thereby modified, so the covariance matrix 



of the observed values can be obtained as 

Q' = (f'f ^) = (LRww^R^L^) = a^LRR^L^ = LQL^. (3.38) 



In our work, equation |3.38| proved to be very useful, as only the total en- 
ergy of an atomic system can be obtained using quantum mechanical cal- 
culations. However, we view the energy as arising from the sum of atomic 
contributions. Thus, in this case, the matrix L describing the relationship 
of the observations (total energy) to the unknown function values (atomic 
energies) consists of zeros and ones. 
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3.2.5 Sparsification 

Snelson and Ghahramani [22] introduced a modification to the standard 
Gaussian Process regression model for large, correlated data sets. The 



computational cost of the training process described in equation |3.13| scales 
as the cube of the number of data points, due to the computational cost of 
inverting the covariance matrix. In case of large data sets, the training pro- 
cess can become computationally expensive. Although the computational 
cost of predicting function values scales linearly with the number of teach- 
ing points, this cost can also be computationally demanding. If the data set 
is highly correlated, i.e. observations are made at closely spaced points, it 
is feasible to use a sparse approximation of the full Gaussian Process, which 
has significantly reduced computational requirements but only a little less 
accuracy. 

We used the sparsification procedure described in j22j. In the sparsifi- 
cation procedure, a set of M pseudo-inputs {xm}m=i chosen from the 
full dataset of N input values {'Xn}n=i, and the covariance matrices Cnm 
and Cm are calculated as 

[Ca/]w = C(x„,x„/) (3.39) 

and 

[CNM]nm = [^n]m = C(x„,Xm). (3.40) 

In order to simulate the full covariance matrix, the matrix 

A = Diag(diag(C^ - C^mC^/Cm^)) (3.41) 

is also needed, where Cat is the full N x N covariance matrix, although 
only the diagonal elements are calculated. The elements of the covariance 
vector k are calculated from the coordinates of the pseudo-inputs and the 
test point x.,,: 

fc^ = C(x^,x,). (3.42) 
The pseudo-covariance matrix of the sparsified data set is 

Qm = Cm + Cm7v(A + a^iy'CNM, (3.43) 
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which can now be used to predict the function value and the error estimate 
at the test point as 

i = k^Q]i/CMJv(A + a%-H (3.44) 
al = C(x„x,) - kJiCjl - Qjl)k + a\ (3.45) 

In order to obtain an optimal set of hyperparameters and pseudo-inputs, 
the likelihood function 

logL = -^t'^iCNMCM^CMN + A + a^iy't 

1 n 
- - log IC^mC^^Cmat + a + an\ - - log27r (3.46) 

is maximised in the space of hyperparameters and pseudo-inputs. 

In our work, observation of single function values is not possible, i.e. 
only total energies (sum of atomic energies) and forces (sum of derivatives 
of local energies) are accessible. Depending on the number of atoms in the 
cell, in the case of total energy observations, and the number of atoms within 
the chosen cutoff radius, in the case of force observations, a large number of 
input values has to be added to the training set, regardless of whether the 
neighbourhood of a particular atom is different from the ones previously 
encountered. Thus in our case, the sparsification process is crucial in order 
to develop a tractable computational scheme. 



4 Interatomic potentials 



4.1 Introduction 



A wide variety of models have been developed to describe atomic interac- 
tions, ranging from the very accurate and extremely expensive to the fast 
but very approximate. Quantum Mechanics ultimately provides a true de- 
scription of matter via solving the Schrodinger equation, but even in its 
crudest approximation, the use of Quantum Mechanics is limited to a few 
hundreds of atoms or a few hundreds of different configurations, which is 
inadequate to sample the entire phase space of a system. A series of further 
simplifications leads to the realm of analytic potentials that can be used 
to describe larger systems or more configurations. The so-called empirical 
potentials are based on fixed functional forms, which are equally based on 
theoretical considerations and intuition, making the creation of new poten- 
tials a combination of "art and science" |23] . Analytic potentials can be 
described as non-linear parametric regression from the statistical point of 
view, where the fitting process is based on experimental or quantum me- 
chanical data. Further, the parametric formula that is chosen to describe 
the behaviour of the real system is often fitted to reproduce some well-known 
equilibrium properties, such as the lattice constant and elastic constants of 
the bulk material or the structure of a liquid, and it is assumed that the 
same function will perform well in very different configurations. This clearly 
implies that analytic potentials are expected to be able to extrapolate to 
very different environments on the basis of the physical insight used when 
the particular functional form was chosen. Even if there exists such a func- 
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tional form, it follows from the overly complicated nature of regression in 
such high dimensions that finding the right form and fitting it to each new 
interesting material is extremely difficult. Our work focuses on the develop- 
ment of a potential based on non-linear, non-parametric regression methods 
that infers the interactions directly from quantum mechanical data, though 
the approach can be adopted irrespective of the origin of the data. 



4.2 Quantum Mechanics 

In the general case, the Schrodinger equation takes the form 

^h^^^^ = H^ir,t), (4.1) 

where is the time-dependent wave-function, r contains the coordinates 
of all the particles in the system and H is the Hamiltonian operator. The 
Hamiltonian can be written as 

^-E^VJ + V'W, (4.2) 

i 

where V{r) is the potential energy. The standing wave solution of the time 
dependent Schrodinger equation is 

^{r,t)=ij{r)exp(~], (4.3) 



which leads to the time-independent form of the Schrodinger equation 

Hipir) = Eiljir). (4.4) 



Atomic systems consist of electrons and nuclei, hence equation 4.2 becomes 



a_9 clcc. 9 dec. nuclei o 



^ 2me ' ^ Ti,- ^ V 



e 



nuclei 2 



E^^i + E^^^^— (4-5) 

A 2m^ ^ TAB 
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where and are the mass and the charge of an electron, and Za 
are the mass and atomic number of the nucleus A. The Born-Oppenheimer 



approximation further simplifies the solution of equation AA by assuming 
that the coupling of the electrons and nuclei is negligible. The basis of 
this assumption is that the mass of the nuclei is at least three order of 
magnitudes larger than the mass of the electrons, thus the electrons adapt 
to the nuclei adiabatically. The Born-Oppenheimer approximation can be 
expressed as 

f-2 elec. 2 elec. nuclei 2 \ 

i i<j ■' i A / 

nuclei n 



+ V ZaZb— = ^e(R)V^(r,R) 

(4.6) 



A<B 



- E + ^e(R) j x(R) = i?x(R), (4.7) 

where the electronic wavefunction ip{Y, R) only depends on the coordinates 
of the electrons r and the coordinates of the nuclei R are regarded as pa- 



rameters. The solutions of equation 4.6, the so-called electronic Schrodinger 



equation provides the potential energy surface (PES) £'e(R), which de- 
scribes the interactions of the nuclei. The nuclear Schrodinger equation 
is often replaced by the classical equations of motion. 



4.2.1 Density Functional Theory 

The analytic solution of the electronic Schrodinger equation is impossible 
for systems more complicated than a hydrogen molecular-ion H^. There 
exists a wide range of methods that are concerned with determining the 
electronic structure, ranging from the very approximate tight-binding|24j 
approach to the essentially exact full configuration interaction [25] method. 
In our work, we used Density Functional Theory as the underlying quantum 
mechanical method. 
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Density Functional Theory aims to find the ground state electron density 
rather than the wavefunction. 

p(r) = J\ip{v,r2,...,rN)\'^dr2 ... dvN (4.8) 

The density depends only on three spatial coordinates instead of 3A^, reduc- 
ing the complexity of the task enormously. The Hohenberg-Kohn principles 
prove that the electron density is the most central quantity determining the 
electronic interactions and forms the basis of an exact expression of the elec- 
tronic ground state. 



4.2.1.1 The Hohenberg-Kohn principles 

The basic lemma of Hohenberg and Kohn[26] states that the ground state 
electron density of a system of interacting electrons in an arbitrary external 
potential determines this potential uniquely. The proof is given by the vari- 
ational principle. If we consider a Hamiltonian Hi of an external potential 
Vi as 

Hi = f + U + Vi, (4.9) 

where T is the kinetic energy operator and If is the electron-electron inter- 
action operator. The solution of the Schrodinger equation 

Hiip = Eip (4.10) 

is the ground state wavefunction ipi, which corresponds to the electron 
density pi. The ground state energy is then 

El = {iJi\Hi\i^i) = J Vi{r)p{r) + (V^i|f + U\^i). (4.11) 

Considering another potential V2, which cannot be obtained as Vi+constant, 
with a ground state wavefunction ip2, which generates the same electron 
density, the ground state energy is 

E2= [v2{r)p{r) + {ij2\f + U\^2). (4.12) 
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According to the variational principle, 

E, < {'^2\Hi\ip2) = j 14(r)p(r) + {^2\f + U\^2) 

^E2 + J[V,{r)-V2{r)]p{r) (4.13) 

and 

E2 < i^mi^i) = J V2ir)p{r) + (V'i|T' + C/|V'i) 

-Ei + j [V2{r) - K(r)] p(r). (4.14) 

By adding the two inequalities together, we find the contradiction 

Ei + E2<Ei + E2. (4.15) 

This is the indirect proof that no two different external potentials can gen- 
erate the same electron density. 

The second Hohenberg-Kohn theorem establishes a link between the 
total energy and the electron density, namely that there exists a universal 
energy functional, which is valid for every external potential, and its global 
minimum corresponds to the ground state of the system and the ground 
state electron density. To prove this theorem, we write the total energy 
functional as a universal functional 

E[p] = Fhk[p] + j V(r)p(r) + Ezz, (4.16) 

where Fhk applies to every electronic system. It determines the entire 
electronic energy except the energy due to the external potential V{r). 
Ezz is the interaction between the nuclei. The ground state energy is given 

by 

E^{^\H\i;) = E[p]. (4.17) 

According to the variational principle, changing the wavef unction to a dif- 
ferent which in turn corresponds to a different electron density p', the 
resulting energy 

E <E' ^ {ijj'lHljjj'), (4.18) 
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is greater than E, thus p cannot correspond to the exact ground state. 

We note that the ground state wavefunction can be found from the 
variational principle 



where ^ is a trial wavefunction. The variational principle can be reformu- 
lated in terms of trial densities, p: 



4.2.1.2 The self-consistent Kohn-Sham equations 

The Hohenbcrg-Kohn principles provide the theoretical basis of Density 
Functional Theory, specifically that the total energy of a quantum mechan- 
ical system is determined by the electron density through the Kohn-Sham 
functional. In order to make use of this very important theoretical finding, 
Kohn-Sham equations are derived, and these can be used to determine the 
electronic ground state of atomic systems. 

The total energy of a system of interacting electrons in the external 
potential of the classic nuclei can be written as 



where T[p\ is the kinetic energy functional, £^xc is the exchange-correlation 
functional, £'h is the Hartree interaction between electrons, Eze is the in- 
teraction between the electrons and the nuclei and Ezz is the nuclei-nuclei 
interaction. The latter three energies have the forms 



E ^imn7{tP\T + U + V\iP), 



(4.19) 



E = mmpE[p\ 



(4.20) 



E[p] = T[p] + E^[p] + E,,[p] + Eze[p] + E, 





(4.22) 




nuclei 



(4.23) 




nuclei 



(4.24) 
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whereas the exact form of functionals T[p\ and -Bxcfp] is not specified by the 
theory. However, according to the Hohenberg-Kohn principle, any system of 
interacting electrons can be described as a system of independent electrons 
moving in an effective potential, meaning that the kinetic energy functional 
can represented by the kinetic energy of non-interacting electrons, Ts- The 
difference between the true kinetic energy functional and Tg 

AT = T[p] - Ts (4.25) 

is included in the exchange-correlation functional, which still needs to be de- 
termined. The non-interacting kinetic energy operator Ts is simply written 
as 

~2 elec. 

Ts--^J2<^n\Vl\^n), (4.26) 
^ n 

where ipn are the independent electron orbitals. The one-electron orbitals 
determine the charge density as 

p{r) = J2rn{r)Mr)- (4.27) 

n 

Hence the ground state will correspond to the electronic density at which 
the functional derivative of the total energy with respect to ipn is zero, while 
maintaining the orthogonality constraints 

mi^j) = Sij (4.28) 

via the Lagrange multipliers e^j. Thus minimising the energy functional and 
the constraints 



leads to the Kohn-Sham equations. 



0, Vn (4.29) 
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which can be solved as n independent equations, 

An< + Zs^n = eWn, (4-31) 

since there exists a basis set where the energy matrix is diagonal. Although 
the minimisation can be performed directly, as implemented in CASTEP 
as conjugate gradients for insulating systems or EDFT[27], an iterative 
approach is more often used. The effective potential Vcs depends on the 
electronic density, thus it is calculated using some initial guess for the den- 
sity, then the Kohn-Sham equations are solved, resulting in a new density. 
This process is repeated until the electron density becomes self-consistent. 



4.3 Empirical potentials 



The Born-Oppenheimer approximation, as given in equation 4.6, suggests 
that when considering solely the interactions between the nuclei, the elec- 
trons do not have to be explicitly taken in account. The reason why the 
Schrodinger equation has to be solved in many applications is the need for 
the accurate description of the Potential Energy Surface provided by Quan- 
tum Mechanics. If there were an alternative way to determine the Potential 
Energy Surface felt by the nuclei ^(R) = -E(R), Quantum Mechanics could 
be bypassed entirely. Empirical potentials, as well as our research, aim to 
achieve this. 

4.3.1 Hard-sphere potential 

The simplest interatomic potential is the hard-sphere potential, that can 
be characterised as 

/ X I if r < rn , , 

Vir) = \ . - , (4.32) 

where tq is the radius of the sphere. Even this simple functional form can 
describe the fact that atoms repel each other due to the Pauli exclusion 
principle, albeit in a rather crude way. As this potential completely lacks 
attractive terms, its use is usually limited to bulk phases. The hard-sphere 
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model is often used for testing purposes, as despite of its simplicity, a system 
of hard-spheres shows a fluid-solid phase transition [28l 129] . More recently, 
systems of colloid particles were also modelled as hard spheres [301I3T] . and 
the results of these simulations have received strong experimental support. 

4.3.2 Lennard-Jones potential 

The Lennard-Jones potential 



was originally introduced to describe the interaction between argon atoms [32] 
The two terms in the expression are the repulsion due to Pauli exclusion 
and the attraction which arises from dispersion interactions. The vari- 
ation is obtained by considering the interaction of two induced dipoles on 
closed-shell atoms. Although the r~^^ term has been introduced primarily 
because it is the square of the other term — therefore its computation is very 
efficient — , and has no theoretical justification, the Lennard-Jones potential 
reproduces the properties of argon remarkably well [33] • In the case of other 
noble gases, quantum effects (for He and Ne), contribution from the interac- 
tion of higher order moments and relativistic effects (for Kr, Xe, Ra) become 
more significant and so the Lennard-Jones model is not so successful. The 
Lennard-Jones potential has been applied to different types of systems, be- 
cause of the ease of computation and the strong physical basis. Potentials 
for ions are often built as Lennard-Jones spheres and point charges [3H ES]) 
the most successful water models are based on partial charges and Lennard- 
Jones term(s)[36l I3Z], or even groups of atoms, such as methyl groups are 
modelled as a single Lennard-Jones particle [3B]- While being a relatively 
simple potential, systems composed of Lennard-Jones particles show com- 
plex phase behaviour, which makes the use of this potential attractive as 
test systems in such studies and method development [391 HOl 1^ . 




(4.33) 
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4.3.3 The embedded- atom model 

The embedded- atom model was developed by Daw and Baskes[42j and was 
originally intended to describe metallic systems. In general, the potential 
takes the form 

where pi is the electron density at the centre of atom i due to the atoms at 
neighbouring sites 

where pj is the electronic density of atom j. F is the embedding functional 
and $ represents the core-core repulsion. This potential is derived from 
density functional theory, where the electron density is approximated by a 
sum of atomic contributions and the energy functional is substituted by a 
simple analytic function. The parameters in the embedded atom potentials 
used in the original applications were fitted to experimental observables, 
such as lattice constants and elastic moduli. 

More recently, a particularly interesting new formulation of the embedded- 
atom model, called the force-matching method has been published by Er- 
colessi and Adams [13] . In this work, no prior assumptions were made on 



the actual functional forms in equations 4.34 and 4.35 All functions were 
described by splines, and the splines were fitted such that the difference 
between the forces predicted by the model and the forces determined by 
first-principle calculations is minimal. This method is an early example of 
using a flexible regression for building interatomic potentials. The differ- 
ences between the forces predicted by the Ercolessi-Adams potential and 
Density Functional Theory are remarkably small in bulk fee aluminium, 
although the description of surfaces is less accurate. 

4.3.4 The modified embedded-atom model 



Although the embedded atom model proved to be a good potential for 
metallic systems, it fails to describe covalent materials, such as semicon- 
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ductors. The reason for this is that the electron density in equation 4.35 
is assumed to be isotropic, which is a good approximation in close packed 
systems, like fee crystals, but in the case of covalent bonds, the electron den- 
sity is higher along the bonds. In order to correct this, an angle-dependent 
density term was introduced by BaskesjH] for silicon 



Pi = ^pirij)+ ^ p(rij)p(rifc)5f(cos%fc), (4.36) 

where djik is the bond angle between the ji and ki bonds. The original 
formulation used the fixed functional form 

5f(cos 9jik) = 1-3 cos^ 9jik (4.37) 

for the angle-dependency, which biased the equilibrium bond angle prefer- 
ence to tetrahedral angles, resulting in a poor description of liquid or non- 
tetrahedral phases of silicon. Lenosky et al. adopted the force-matching 
method for the modified embedded-atom mo del ^5]. 

Taylor showed an elegant generalisation of the modified embedded atom 
model in [Hj. In this work, he formulated a Taylor-expansion of the total 
energy functional around the ground-state density of atoms in terms of 
density variations, which led to a general expression for the total energy 
of the system as a function of the atomic coordinates. The energy of an 
atomic system is determined as a functional of the atomic density as 

E = ^p{r)], (4.38) 

where 

p(r) = ^(5(r-r,) (4.39) 

i 

and 6 is the Dirac-delta function. This form is, in fact, an alternative de- 
scription of the total energy as given by Density Functional Theory. The 
atomic density determines, through Poisson's equation, the external poten- 
tial through which the electrons move as 

Wext = --, (4.40) 
Co 
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which in turn corresponds to a ground state electron density and a total 
energy. If Eq is the minimum of the total energy with respect to the atomic 
density, the energy can be expressed in a Taylor series in variations in the 
density p = po + 6p as 



E = Eo + 



6E 



(5p(r)dr + j j 



6^E 



5p^ 



(5p(r)(5p(r')drdr + . . . . (4.41) 



The density variation Sp is given by 



Sp{r) = J2[S{r-r,)-S{r-r% 



(4.42) 



where r° are the equilibrium positions of the atoms, corresponding to the 
ground state atomic density. The first-order term in equation |4.41 dis- 
appears because the Taylor-expansion is performed around the minimum. 



Substituting 4.41 in equation 4.42 then integrating results in 



E = Eo = J2 



52$ 



52$ 



5p2 



5p2 



+ 



52$ 



5p2 



(4.43) 



Introducing the new functions 



52$ 



5p2 



and 



9iri) = E 



^2$ 



5p2 



E 



52$ 

5p2 



(4.44) 



(4.45) 



J.0 J.. 



we can write the total energy as a sum of one- and two-body terms 

E = E', + Y1 9{^^) + E r,) + . . . . (4.46) 
Similarly, if we consider the local atomic densities around atom i 



(4.47) 
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where w is a screening function, we obtain the total energy expression up 
to second order 



i i j^i 

X] X] 5Z f^^^i' Yik)w{rij)w{rik). (4.48) 



This expression has the same form as the modified embedded atom model. 
Taylor represented the local atomic density by bond-order parameters and 
different radial functions as discussed in section |2.3.1 By choosing appro- 
priate radial functions, he obtained the original modified embedded-atom 
formula, but systematic improvement of the formula is also possible in his 
framework. 



4.3.5 TersofF potential 



The form of interatomic potential suggested by Tersoff |16] is an example of 
the wider family of bond-order potentials [17] . The total energy is written 
as a sum of pair like terms. 



E 



Vij = fcnt{rij)[fK{rij) + bijfA{rij)] 



(4.49) 
(4.50) 
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where /r and /a are repulsive and attractive terms, /cut is a cutoff function, 



and bij is tfie bond-order term 



fR{rij) = Aijexp^-XijVij) (4.51) 
fA{rij) = -Bijexp{~fiijrij) (4.52) 



/cut(r.,)=<^ i + icos(7r|p5|) if i?,, < r,, < (4.53) 



if Tij > Sij 



% = x.,(i+/3ra;o'/'"' (4.54) 

Cij = ^ fcnt{rik)uJikg{Oijk) (4.55) 



Tlie resulting potential is, in fact, a many-body potential, as the bond-order 
terms depend on the local environment. Bond-order potentials can also be 
derived from a quantum mechanical method, tight-binding [4 7j and can be 
regarded as an analytical approximation of the solutions of the Schrodinger 
equation. 



4.4 Long-range interactions 

The electrostatic contribution to the total energy is often not negligible. If 
there is charge transfer between atoms or polarisation effects are significant, 
the interaction between charges, dipoles or even higher order multipoles 
needs to be calculated. There are well-estabhshed methods to determine the 
electrostatic energy and forces, such as the Ewald-summation technique |48j . 
The central question is the values of the electric charges and multipoles in a 
particular model. In many cases fixed charges are used, for example, most 
water potentials |19] and models of ionic crystals [50] have predetermined 
charges. Classical water potentials describe the structure of bulk liquid 
water well, however, the representation of solutions is often poor due to the 
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fact that these models no longer describe the interactions correctly in the 
modified environment and the resulting electric fields. 

The electronegativity equalisation method [51] and the charge equilibra- 
tion method [S2] were designed to introduce charges which depend on the 
atomic environment and the local electric field. The atomic charges pre- 
dicted by these methods agree well with the experimental values and with 
the ones determined by quantum mechanical methods for ionic crystals and 
organic molecules. 

Electrostatic models including multipoles have also been developed. The 
multipoles are often deduced from the electronic structure determined by 
ab initio methods, for example, by using Wannier functions The de- 
pendence of the multipoles on the local electric field is accounted for by 
including polarisability in the model. An example of a polarisable model is 
the shell model, where a charge is attached to the atom by a spring, hence 
the dipole of the atom reacts to changes in the local electric field. 

4.5 Neural network potentials 

Behler and Parrinello presented a new scheme for generating interatomic 
potentials using neural networks that are trained to reproduce quantum me- 
chanical data[3]. The main assumption of the model is that the total energy 
of an atomic system can be described as a sum of atomic contributions 



where each individual term Ei depends only on the configuration of the 
neighbouring atoms within a given cutoff distance. This local environment 
is represented using a set of symmetry functions 




(4.57) 





i 



2'-^^ 5^5^(1 + A;3 COS 



exp[-r7;3(rj + r^^ + r|J]/cut(rij)/cut(nfc)/cut(?^ife), (4.59) 
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where the cutoff function is 



fcnt{r) 



if Tij < Tcut 
if Tij > Tcut 



(4.60) 



Thus the atomic local energies Ei depend on the set of symmetry variables 
{Gj°',G'^^}a,i3 in an unknown way. Instead of trying to find a parametric 
model for this function, Behler and Parrinello used non-parametric regres- 
sion via neural networks. The input data used to perform the regression is 
a set of total energies from reference calculations, in this case these were 
Density Functional Theory calculations of different configurations of bulk 
silicon. The parameters in the layers of the neural network were optimised 
such that the difference between the reference energies and the energies 
predicted by the neural network is minimal. The resulting potential can 
then be used to describe an arbitrary number of silicon atoms. For each 
atom, the symmetry variables are first determined, then these are fed to the 
neural network and the neural network predicts the atomic energies, which 
are added together to obtain the total energy. 



4.6 Gaussian Approximation Potentials 



Our aim is to formulate a generic interatomic potential, which can be reli- 
ably used in a wide variety of applications. Arguably, Quantum Mechan- 
ics is such an interatomic potential, as it provides ab initio data that, to 
our current knowledge, is ultimately correct to the extent that any inac- 
curacies are due to the limitation of the Born-Oppenheimer approxima- 
tion or the employed quantum mechanical model. The great advantage of 
quantum mechanical methods is that they have true and proven predictive 
power, whereas classical potentials can be regarded as parametric regres- 
sion formulas that, in general, cannot be used outside their fitting regime, 
which usually cannot be unambiguously classified. However, the solution of 
quantum mechanical equations is computationally expensive, which limits 
the use of Quantum Mechanics to a modest number of atoms and a few 
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nanoseconds of simulation time — woefully inadequate for biomolecular and 
nanotechnological applications. 

As in the case of other interatomic potentials, we base Gaussian Approx- 
imation Potentials on the assumption that the total energy of the system 
can be written as a sum of two terms: the first is a local, atomic contribution 
and the second is the long-range, electrostatic part 



atoms atoms ^ 

^= E^^ + o E^^^^-' (4-61) 

where the operator L can be written as 

L, = qi + pi■V^ + Qi■. ViVi + ■ ■ • , (4.62) 

and Qi, Pi and Qj denote the charge, dipole and quadrupole of the i-th atom, 
respectively. We formulate the locality of the atomic energy contributions 

as 

ei=e{{rij}), (4.63) 

where only the relative positions r^j of the neighbouring j atoms within a 
spherical cutoff are considered. In atomic systems, for which charge trans- 
fer between atoms and polarisation effects are negligible, we can simply 



drop the second term in equation 4.61 We note that short-range, well 



screened electrostatic effects can be implicitly merged into the first term in 



equation 4.61 without great sacrifices in accuracy. 

The strict localization of e enables the independent computation of 
atomic energies. 

The central challenge in the development of interatomic potentials is 
finding the form of e{{rij}). In our approach, we do not make any prior 
assumptions about the functional form of the potential. Instead, we use 
non-parametric, non-linear regression in the form of a Gaussian Process to 
find the function values at arbitrary values. In the regression, quantum me- 
chanical data, such as total energies and atomic forces are used as evidence. 
Gaussian Approximation Potentials can be regarded as interpolation of the 
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quantum mechanical potential energy surface. Moreover, the Gaussian Pro- 
cess framework allows us to to build into the model a strong bias, namely, 
that the atomic energy function is smooth. 

The advantage of Gaussian Approximation Potentials is that they are 
very flexible. In contrast to analytic potentials, the accuracy of Gaussian 
Approximation Potentials can be improved by adding more quantum me- 
chanical data at various points in configurational space without changing 
the fit globally. As the Gaussian Process predicts its own accuracy, it is pos- 
sible to use it as a "learn on the fly" method, i.e. if the predicted variance 
of the energy of the force in the case of a new configuration is higher than 
a pre-set tolerance, the energy and forces for the new configuration can be 
calculated using Quantum Mechanics, then the obtained data is added to 
the database in order to improve the fit. The flexibility of the fit ensures 
that the best possible fit is achieved for any given data. 

The Gaussian Approximation Potential scheme is similar to the Neural 
Network potentials introduced by Behler and Parinello [5| , as both uses non- 
linear, non-parametric regression instead of fixed analytic forms. However, 
the representation of the atomic environments in GAP is complete and the 
Gaussian Process uses energies and forces for regression. Moreover, the 
training of the neural network involves the optimisation of the weights, 
whereas the training in the case of Gaussian Process is a simple matrix 
inversion. 



4.6.1 Technical details 

The atomic energy function e depends on the atomic neighbourhood, but 
it is invariant under rotation, translation and permutation of the atoms. 
One of the key ideas in the present work is to represent atomic neigh- 
bourhoods in a transformed system of coordinates that accounts for these 
symmetries. Ideally, this mapping should be one-to-one: mapping differ- 
ent neighbourhood configurations to the same coordinates would introduce 
systematic errors into the model that cannot be improved by adding more 



quantum mechanical data. In section 2.3 we described a number of trans- 
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formations that can be adapted to construct an invariant neighbourhood 
representation. For our work, we have chosen the four dimensional bispec- 
trum elements. In order to ensure that the representation is continuous in 



space, we modified the atomic density in equation 2.18 to 



p.(r) = S{r) + J2S{r~ v,,)U{n,), (4.64) 



where /cut is a cutoff function, in our case 

1/2 + cos(7rr/rcut)/2 if r < rcut 

In Quantum Mechanics, atomic energies are not directly accessible, only 
the total energy of a configuration and the forces on each atom can be 
determined. The forces contain cross-terms of the derivatives of the local 
energies. The force on atom i can be obtained by differentiating the total 
energy with respect to the Cartesian coordinates of atom i, written as 

„ atoms ^ 

3 

As £ = for any r^j > rcut, this summation only runs over the Ni neighbours 
of atom i. The atomic energies depend directly on the bispectrum elements, 
which are determined by the neighbourhood, thus the force becomes 

where hk is the fc-th element of the bispectrum vector, and is the bispec- 
trum of atom j. Therefore we can substitute total energy observations in the 
form of sums of atomic energies, and forces, in the form of sums of deriva- 



tives of atomic energies, directly in the formulae shown in sections 3.2.3 and 

If is the number of teaching points, the computational resources re- 
quired for Gaussian Process regression scales as A^^ for training and as A^ 
for predicting values and as A^^ for predicting variances. Due to the fact 
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that we cannot add single atomic energy observations to the database, only 
total energies or forces, the size of the training set and therefore the com- 
putational costs would grow enormously. For example, if we intend to add 
configurations with defects to a database that up to this point contains 
data for bulk atoms only, we have to add all the atomic neighbourhoods 
in the configuration that contains the defect, despite of the fact that most 
of them are redundant because they incorporate the bulk data that is al- 
ready in the database. Similarly, a single configuration can contain many 
correlated neighbourhoods. 

A possible solution for this problem was given by Snelson and Ghahramani [22] 
and it was described in section 3.2.5[ By choosing M sparse points from the 
complete training set, the computational resources required for the training 
process scale as NM"^, while the cost of the prediction of function values 
and variances scales as M and M^, respectively. 



4.6.2 Multispecies potentials 

It is possible to extend the scope of Gaussian Approximation Potentials to 
cases where there are more than one atomic species present in the system. 
There are two main differences with respect to the method described above 
for monoatomic potentials. On the one hand, the different species have to 
be distinguished in the atomic neighbourhood while retaining the rotational 
and permutational invariance, and, on the other hand, charge transfer be- 
tween different types of atoms might occur, in which case the long-range 
interactions have to be taken in account. The latter is not necessary in 
every multispecies system, for example, in hydrocarbons or metallic alloys 
there are no significant long-range interactions present [M] . 



By modifying the atomic density function in equation 2.18 as in equa- 
tion [m 

p,(r) = Si5iO) + J2 Sj^i^ - i-ii), (4.68) 
j 

where the different species are distinguished by the different weights of the 
Dirac-delta functions. The bispectrum of pi remains invariant to the global 
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rotation of the atomic neighbourhood and to permutations of atoms of the 
same species. 

In this study, we have not developed any potentials that contain elec- 
trostatics explicitly, but there is good evidence [55]. that electrostatic pa- 
rameters, such as charges and multipoles can be obtained from electronic 
structure calculations. It is possible to fix these parameters, but in general, 
the charges and multipoles will be determined by the local neighbourhood 
and the local electric field, and so these effects must be incorporated any 
accurate potential. This branch of our research awaits implementation. 



5 Computational methods 



5.1 Lattice dynamics 

5.1.1 Phonon dispersion 

Crystalline materials are composed of periodic replicas of unit cells. In our 
case, the unit cell is a parallelepiped defined by the edge vectors ai, a2 and 
as. The volume of the unit cell is the absolute value of determinant of the 
lattice matrix A = [ai,a2,a3], which is nonzero, as the column vectors of 
the matrix are linearly independent. The smallest unit cell is called the 
primitive cell. The positions of the atoms in the primitive cell form the 
basis of the crystal. 

The crystal is built by translating the primitive cell by all the translation 
vectors 



where h, I2 and I2, are integers. Hence the equilibrium position of the i-th 
atom in the crystal can be written as 



At finite temperature, atoms vibrate around their equilibrium positions, 
and their displacement can be described by a small vector u. The actual 
position of an atom is given by 



Ri = /lai + 123L2 + ha.^, 



(5.1) 




(5.2) 



(5.3) 
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The total potential energy of the crystal is a function of the positions 
of the atoms. The Taylor-expansion of the potential energy is 

= 00 + ^ (pljaUlja + ^ (plja,Vj'a'UljaUl'j'a' + • • • , (5.4) 



where 0o is the equilibrium energy. The first term in equation |5.4| is the 
related to the force through 



du 



\ja- 



(5.5) 



This term is zero, because we perform the Taylor expansion around the 
minimum. The second term contains the harmonic force constants, given 

by 

In the harmonic approximation, higher order terms in the Taylor-expansion 
are neglected. Newton's equations of motion are therefore written as 



U\ja — 01ja,l 



'j'a'U'Vj'a', 



(5.7) 



\'j'a' 



which have wavelike solutions 

Ui 



= ^^E^(k,z/)e(k,z/,j)exp [^(kr° -a;(k,z/)t)] . (5.8) 



Substituting |5.8| in |5.7[ we obtain the eigenvalue equation 

. , ] ea'(k,i/', j' 



u^{k)ea{ki,u,j) = ^ D^o,' I 

a'j'u' \ 



(5.9) 



where D is the dynamical matrix, the Fourier transform of the force con- 
stant matrix: 



■D r\rv' 



V ) = ^ ^ exp [zk(r°. - r?,., 



,)] . (5.10) 



Non-trivial solutions of equation 5.9 can be found by solving the secular 
determinant 

|D(k) - w^ji _ 0, (5.11) 
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where the solutions are the frequencies of different phonon modes at wavevec- 



tor k. Substituting these solutions into |5.9[ the mode eigenvectors can also 
be obtained, and these correspond to the normal modes of the vibrations. 
A more complete discussion of lattice dynamics can be found, for example, 
in |56]. 

In our work, we first constructed a large supercell from the primitive cell, 
then perturbed each atom in the original 1 = (0,0,0) cell by a small amount 
along the coordinate axes and calculated the forces on the atoms in the per- 
turbed supercell. We obtained an approximate force constant matrix by the 
numerical differentiation of the forces, which we Fourier-transform to obtain 
the dynamical matrix. This procedure can be performed using any inter- 
atomic potential model, although using Quantum Mechanics can be partic- 
ularly expensive in the case of large supercells, i.e. for small wavenumbers. 
However, this large computational cost in DFT can be avoided by calcu- 
lating phonon dispersion relations using Density Functional Perturbation 
Theory, as described in [57] . 

5.1.2 Molecular Dynamics 

Alternatively, the phonon frequencies can also be obtained from molecular 



dynamics runs|58j. The relative displacements Ui^ in equation 5.8 can be 
Fourier-transformed, leading to 



eki(t) = ^J— ^^exp(-ikRi)uij oc exp(-ia;(k, z/)t), (5.12) 

^Vcell , 

3 1 u 

where A^^ceii is the number of primitive cells in the supercell. Fourier- 



transforming equation |5.12| to frequency space gives 

ek(w) oc ^(5(w - Wk,!/)- (5.13) 

V 

The spectral analysis of ek(w), i.e. finding sharp peaks in the power spec- 
trum 

Pk, = |ek,(u;)r (5.14) 

gives the phonon frequencies. 
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The advantage of this method is that it can be used for more comph- 
cated systems, where exphcit calculation of the full dynamical matrix would 
be extremely expensive. Furthermore, we can calculate the temperature de- 
pendence of the phonon spectrum by simply performing molecular dynamics 
simulations at different temperatures. The temperature dependence of the 
phonon spectrum is due to anharmonic effects, i.e., at larger displacements 
when terms higher than second order contribute to the potential energy in 



equation 5.4 



5.1.3 Thermodynamics 

The quantum mechanical solution of a system of harmonic oscillators [56] 
states that the allowed energies of a phonon mode labelled by k and u are 

Eku = (l+n) w(k,z/), (5.15) 



where h is the reduced Planck constant, and is a non- negative integer. 
The canonical partition function of a system can be calculated as 



Z = 5^exp(-/3E,), 



(5.16) 



where Ej is the energy of the j-th state and (3 = Substituting 
into this expression, we obtain 



5.15 



vib. 



n 

k,i/ 



exp Q +nkJ\ hu(k, 



(5.17) 



which can be simplified by using 

oo 

exp(— nx 

to 



1 



n=0 



^vib. 



n 



1 — exp(— x) 
— exp{—f3hujik, u)) 



In the case of a crystal, the total partition function is 

Z = exp(-/30o)^vib.- 



(5.18) 



(5.19) 



(5.20) 
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The partition function can be used to obtain all thermodynamic quantities. 
For example, the free-energy can be obtained as 

F = -kBT\nZ (5.21) 
= 00 + keT In [2 sinh(/3^(k, z/)/2)] , (5.22) 

]l,U 

and the internal energy is 

= ^° + g G + e.p(- A. .))-l )- (^-2^) 



This result leads us to a rather crude method for approximating the real 
temperature in the case of a classical molecular dynamics run|59j. We 
equate the kinetic energy -Ekin.(7MD) to the quantum mechanical vibra- 
tion energy U^ih.iTQM) and find the temperature Tqm when U^ih.iTQM) = 
-Skin.(^MD)- In the high temperature limit Tqm = Tmd, but this expression 
allows us to relate results from low-temperature molecular dynamics runs 
to experimental values. 

The const ant- volume heat capacity is defined as 

dU 

cv = ^, (5.25) 
which, in the case of harmonic crystals, can be calculated as 

Cy = ^Ck,., (5.26) 

where Ck,v is the contribution to the specific heat from mode (k, u) 



V kBT J [exp{Phu(k,u)) -if ^ ' ' 



The volumetric thermal expansion coefficient can also be calculated from 
the free energy. The thermal expansion coefficient is defined as 
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where kt is the isothermal compressibihty. The pressure is given by 

dV 



P = -[%] , (5-29) 



which leads to the expression 



a = Y'^lk,uCk,u, (5.30) 
where 7k,iy are the k-vector dependent Griineisen parameters 

7k,^ = 71— ^-^w k, u)dV = , 5.31 

a;(k, z/j omV 

which describe the dependence of the phonon frequencies on the lattice 
volume. The linear thermal expansion can be obtained in a similar way 
and the derivation can be easily extended to non-isotropic cases. 

We note that through the Griineisen parameters anharmonic corrections 
of the potential energy are involved in the thermal expansion coefficient. 
The approximation that the vibrational free-energy function depends on 
the volume of the crystal through the change of the phonon frequencies 
described by the first-order approximation 

uik, u, V) = u{k, u, Vo) + ^^^|^A\/ (5.32) 



is usually referred to as the quasi-harmonic approximation 

At low temperatures, if most of the anharmonic effects are due to lat- 
tice expansion, the quasi-harmonic approximation can be successfully ap- 
plied. However, if the average displacement of the atoms is so large that 
the potential energy cannot be approximated by quadratic terms anymore, 
the approximation fails. In such cases, we can use a classical simulation 
method such as molecular dynamics to sample the phase space and calcu- 
late observables using these samples. We should note that this is strictly 
valid only in case of high temperatures, where Tmd ~ ^qm- 

However, if the anharmonic effects are large even at low temperatures, 
precise results can be obtained by methods that treat the quantum char- 
acter of the nuclei explicitly, for example by path- integrals [60] or explic- 
itly solving the nuclear Schrodinger equation[6T]. Path-integral methods 



5.1. LATTICE DYNAMICS 



67 



have been successfully used to calculate the partition function of semicon- 
ductor crystals [62] and hydrogen impurity in metals [63]. Explicit solution 
of the nuclear Schrodinger equation is routinely performed in the case of 
molecules [HI] by using the system of eigenfunctions of the harmonic solution 
to expand the wavefunction. 



6 Results 



6.1 Atomic energies 

The total energy in Quantum Mechanics is a global property of the sys- 
tem consisting of atoms and depends on 3A^ — 6 variables, namely, the 
coordinates of the atoms. However, all interatomic potentials are based 
on the assumption that the energy can be written as a sum of atomic or 
bond energies, which are local and if appropriate, a long-range electrostatic 
component. In our work, we intend to estimate the atomic energies by a re- 
gression scheme based directly on quantum mechanical data. If there were 
a way to extract atomic energies directly from quantum mechanical calcula- 
tions, these could be used in the regression. Firstly, we consider ideas that 
lead to such atomic energies. 

In fact, the existence of atomic energies can be justified by showing that 
the force acting on an atom does not change significantly if the position of 
another atom that is far enough away is perturbed. This statement can be 
formulated as 

WZ.W^,E ^0 as \xj - Xil oo, for Vn, (6.1) 
which we refer to as the "strong locality assumption" . 

6.1.1 Atomic expectation value of a general operator 

The basic idea in the derivation of atomic properties in Quantum Mechanics 
is partitioning the total expectation value of an arbitrary operator by using 
a suitable atomic basis set. This is a generahsation of the MuUiken charge 
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partitioning scheme. We consider a system of non interacting electrons 
moving in an effective potential Ves, which is the case in DFT. Thus the 
expectation value of a general operator O is 

{0)=J2MH0\A), (6.2) 

i 

where is the occupation number of the single-electron orbital ipi. If ipi is 
expressed in an atomic basis {4>a} in the form 

V^i = 5^iVf0„, (6.3) 



a 



we can write equation 6.2 



as 



(O) = EE^^" {^0* (MO\<l>,). (6.4) 

i afi 

Introducing the density kernel K as 

K'^^ = Y.m{N^y (6.5) 

i 

and the matrix of operator O as 

Opa. = {(pp\d\(Pa) (6.6) 

we obtain 

(O) = K'^^Op^ = 5^(KO)„, = Tr (KO) . (6.7) 

Each basis function 0^ belongs to a certain atom, thus we use the parti- 
tioning 

{0)a = Y.{KO)^^, (6.8) 
which conserves the total value 



A 



(6.9) 
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6.1.1.1 Mulliken charges 

The total number of electrons is obtained by setting the operator to O = 1: 
N = Y1 M^^\^^) = E = E ^"'^ / ^-{rWpir) (6.10) 

i i a,j3 

which leads to the well-known expression for the MuUiken-charges 

iV^ = 5^(KS)_, (6.11) 

where the elements of the overlap matrix are defined by 

Sep = j C(r)0/3(r) (6.12) 

6.1.2 Atomic energies 

Substituting the Hamiltonian operator H into equation 6.8, we obtain a 



possible definition for the atomic energies. In the case of Density Functional 
Theory, the operators can be formulated as follows. 
The total energy can be written as 

E[p\ =Ts + En[p] + ^xc[p] + Eze[p] + Ezz- (6.13) 
The independent-particle kinetic energy T, is given by 

^ elec. 

T, = --J2M^^\A,\^^), (6.14) 

i 

thus we need to substitute O = ^ ■ Aj and the matrix elements Ylii^al^i 
have to be calculated to obtain the atomic kinetic energy. 
The Hartree energy is defined by the equation 



drdr' ^';'_^y . (6.15) 



which we rewrite as 



Eh[p] = 1 y"drp(r)FH(r) = ^5^/.(^.|KH|V^.), (6.16) 
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where the Hartree-operator can be obtained as 




(6.17) 



Similarly, the interaction between electrons and nuclei is given by 

Eze[p] = y"drp(r)V;xt, (6.18) 
and the exchange- correlation energy is 

E^,[p] = y"drp(r)e,c[p(r)]. (6.19) 

Hence the operators Kxt and exc[p(r)] are required to calculate the matrix 
elements of the external energy matrix and the exchange-correlation energy 
matrix. 



6.1.3 Atomic multipoles 

In general, the multipole coefficients of an arbitrary charge distribution p(r) 
can be obtained as 

= J^ J drx^.x^^ . . .X5,p(r), (6.20) 

where the Cartesian coordinates. This definition can 

be regarded as an expectation value of the general position operator X, 
therefore it can be substituted into equation |6.8[ to produce the definition 
of atomic multipoles: 

/^A6,?2,-,?i = [| 1 d'^^ii^i2---Hi(t>a{r)(j)}{r), (6.21) 

where x^, is measured from atom A. 

It is interesting to note that the expression for atomic multipoles in 



equation |6.21| can be obtained by defining the atomic charge density pA as 

p^(r) = 5^ir">,(r)0;(r). (6.22) 

a<^A 
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This definition of tlie atomic cliarge density is consistent witli general pliys- 
ical considerations, for example it gives the total electron density when 
summed for all atoms: 



6.1.4 Atomic energies from ONETEP 

ONETEP [6l], the order-N electronic total energy package is a numerical 
implementation of Density Functional Theory. Unlike usual implementa- 
tions of DFT, the computational resources required for the calculation of 
the energy of a particular atomic system scales linearly with the number 
of electrons, which makes it exceptionally efficient in investigations of large 
systems. However, in our work we exploited another feature of ONETEP, 
namely, that it uses local basis functions. 

6.1.4.1 Wannier functions 

The electronic structure of periodic crystalline solids is usually represented 
by Bloch orbitals ■j/'^k, where n and k are quantum numbers of the band 
and crystal momentum, respectively. The Bloch states are eigenfunctions 
of the Hamiltonian of the crystal, obeying the same periodicity. Because 
of the fact that they are usually highly delocalised, it is often difficult to 
deduce local properties from Bloch orbitals, for instance, bonding between 
atoms or atomic charges. 

An equivalent representation of the electronic structure is provided by 
Wannier functions [65], which are connected to the Bloch orbitals via a uni- 
tary transformation. Denoting the Wannier functions of band n of cell R 
by Wnir — R), we express the transformation as follows: 




(6.23) 



A 




(6.24) 



The back transformation is given by 




(6.25) 



R 
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where the sum is performed over all the unit cells in the crystal. 



The Wannier functions obtained in equation |6.25| are not unique, be- 
cause it is possible to mix the Bloch states of different band numbers by 
a unitary matrix U*^'^-'. The resulting Wannier functions are also a com- 
plete representation of the electronic structure, although their localisation 
features are different: 



Wr, 



V 

8^ 



|dke-'^^(5^f/«^^k(r)j. (6.26) 



Since both transformations in 6.25 and 6.26 are unitary, and the original 
Bloch states are orthogonal, the resulting Wannier functions are also or- 
thogonal. 



6.1.4.2 Nonorthogonal generalised Wannier functions 

The matrix U can be optimised in such a way that the resulting Wannier 
functions are maximally localised, as described in [65j. However, orthog- 
onality and localisation are two competing properties, and more localised 
Wannier functions can be obtained if the orthogonality constraint is re- 
moved. 

The linear combination of the Bloch orbitals of different bands can be 
performed by using a non-unitary matrix, resulting in nonorthogonal Wan- 
nier functions [66] 0q,r: 




(6.27) 



In ONETEP, Wannier functions are constrained in a localisation sphere cen- 
tred on atoms, i.e. 0q,r, = outside the localisation sphere, providing an 
atomic basis set. The radius of the localisation sphere is set by considering 
the electronic structure of the system or it can be increased until conver- 
gence of the physical properties is achieved. The nonorthogonal Wannier 
functions are optimised during the electronic structure calculation, hence 
they represent the "best possible" atomic basis functions of a particular 
system. In our studies of the atomic properties, we used these Wannier 



6.1. ATOMIC ENERGIES 



75 



functions as the atomic basis set for calculating atomic properties with our 



definition for these properties given in equation 6.8 



6.1.5 Locality investigations 

In order to use the atomic energies obtained from quantum mechanical 
calculations as the target data of our regression scheme we have to ensure 
that the atomic energies are local. We tested the degree of this locality 
through the variation of the local energy caused by the perturbation of 
atoms outside a spatial cutoff. If the atomic energies are local, they can be 
regarded purely as functions of the local atomic environment and can be 
fitted by the Gaussian Process method. 

The basic idea for testing the degree of the locality is that we generate 
a number of configurations, where the nearest neighbours of a certain atom 
were held fixed, while the positions of other atoms were allowed to vary. 
We calculated the local energy of the atom whose neighbourhood was fixed 
for each of these configurations and compared them. We then repeated this 
process for different neighbourhood configurations. 

As a test system, we used clusters of 29-71 silicon atoms. The configu- 
rations were generated by molecular dynamics simulation at 3000 K, where 
the forces were obtained from the Stillinger- Weber potential [67]. We per- 
formed the electronic structure calculations of the different clusters using 
ONETEP, and we also used ONETEP to determine the atomic energies, as 



described in section 6.1.2 A typical cluster is shown in figure 6.1 

We examined the components of the atomic energies which depend prin- 
cipally on the electron density of the central atom. We calculated the av- 
erage variation of the atomic kinetic, nonlocal and exchange-correlation 
energies and also, the total atomic energy corrected for the long-range in- 
teractions. The atomic energy was calculated as 

^ atoms ^ 

E, = Ef"" + + + + Ef^ - - 5^ LiLj — , (6.28) 
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Figure 6.1: An example configuration of tlie examined Si clusters. The 
atoms which were fixed during the molecular dynamics simulation are shown 
in green. 



where the operator L can be written as 

Li = qi + pi-V, + Qi: ViVi + ■■■. (6.29) 
The variations of the sum of kinetic, nonlocal and exchange-correlation 



terms are depicted in figure 6^ , while figure 6^ shows the spread of atomic 
energies corrected for long-range interactions. Ideally, variations in the 
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Figure 6.2: Atomic energies E = Ef^'^ + Ef°'^^°'^ + Ef^ of silicon clusters for 
different neighbourhoods. 



atomic energies should be within 0.1 eV for each neighbourhood as this is 
usually reckoned to be the standard DFT error. It is obvious that our results 
do not fit into this range. These results are not satisfactory and indicate that 
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Figure 6.3: Atomic energies of silicon clusters corrected for Coulomb con- 
tributions: E = Ei — I y]f°'^^ LiLj— for different neighbourhood configu- 
rations. 



either the atomic energies depend on more neighbours, or that the atomic 
energies calculated by this particular method are not local. However, we 
found when using our final implementation of Gaussian Process (described 
in section 4.6.1) an explicit definition of local energies is not necessary, as 
the Gaussian Process infers these from total energies and forces. We shall 



discuss the inferred atomic energies in section 6.6 



6.2 Gaussian Approximation Potentials 

We have implemented the Gaussian Process to infer atomic energies from 
total energies and atomic forces. Gaussian Processes belong to the family 
of non-linear, non-parametric regression methods, i.e. not having fixed 
functional forms. The atomic environments are represented by the four 
dimensional bispectrum, which is invariant to permutation of neighbouring 
atoms and the global rotation of the environment. In order to demonstrate 
the power of this new tool, we built potentials for a few technologically 
important materials and we examined how closely the fitted potential energy 
surface is to the original, quantum mechanical one. At this stage of the 
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work, most of the configurations we used for the training were close to the 
crystaUine structure of the material, hence the use of the current potentials 
is limited to crystalline phases. However, to show the ability of our potential 
to describe mode widely varying configurations, in the case of carbon we 
built a potential that could describe the sp^ — sp^ transition of the carbon 
atoms, the (111) surface of diamond and a simple point defect. 

Our aim is twofold. On the one hand, we would like to generate po- 
tentials for general use, which can be extended, if needed. On the other 
hand, there are applications where "disposable" force fields are sufficient. 
For example, when simulating a crack or defects in a crystalline material, 
only a restricted part of the potential energy surface is accessible. In these 
cases, a purpose-built potential can be used, which can be generated more 
rapidly. 

6.2.1 Gaussian Approximation Potentials for simple 
semiconductors: diamond, silicon and 
germanium 

Our first application of the Gaussian Approximation Potentials was a set of 
potentials for simple semiconductors. We calculated the total energies and 
forces of a number of configurations, which were generated by randomly 
displacing atoms in the perfect diamond structure. We included 8-atom 
and 64-atom supercells at different lattice constants and we perturbed the 
lattice vectors in some cases. The atoms were displaced at most by 0.3 A. 

The parameters of our representation are the spatial cutoff and the res- 
olution of the bispectrum. We set the former to 3.7 A, 4.8 A and 5.0 A 
for carbon, silicon and germanium, respectively. The resolution of the bis- 
pectral representation can be changed by varying a single parameter, the 
maximum order Jmax of the spherical harmonics coefficients we use when 
constructing the bispectrum. We used J^ax = 5 in all cases. During the 
sparsification, we chose 300 atomic neighbourhoods in all cases. Due to 
the method of generating these configurations all the neighbourhoods were 
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Force error / eV/A 

Figure 6.4: Force errors compared to DFT forces for GAP and the Brenner 
potential in diamond. The left panel shows the force errors at different DFT 
forces. On the right panel, the distribution of the force errors is shown. 



similar, thus we decided to select the set of atomic environments for the 
sparsification randomly. 

The electronic structure calculations were performed using CASTEP |27j. 
We used the local density approximation for carbon and the PBE gener- 
alised gradient approximation for silicon and germanium. The electronic 
Brillouin zone was sampled by using a Monkhorst-Pack k-point grid, with a 
k-point spacing of at most 1.7 A ^ . The plane- wave cutoff was set to 350 eV, 
300 eV and 300 eV for C, Si and Ge, respectively, and the total energies 
were extrapolated for infinite plane-wave cutoff. Ultrasoft pseudopotentials 
were used with 4 valence electrons for all ions. 



In figure 6.4 we show the performance of GAP, compared to the state- 
of-the-art interatomic potential, the Brenner potential [68]. The set of con- 
figurations used for testing was obtained from a long ab initio molecular 
dynamics run of a 64-atom supercell at 1000 K. The absolute values of the 
components of the difference between the predicted and the DFT forces are 
shown as a function of the DFT force components and the distribution of 
these differences is also displayed. The force and energy evaluation with 
the Gaussian Approximation Potential for diamond, in the current imple- 
mentation, is about 4000 times faster than Density Functional Theory in 
the case of a 216-atom supercell. 
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Tersoff GAP Tersoif GAP 




Figure 6.5: Force errors compared to DFT forces for GAP and the Ter- 
soff potential. Tlie silicon potentials are shown in the left panel and the 
germanium potentials in the right. 



We show in figure 6^ the results for our potentials which were developed 
to model the two other group IV semiconductors, silicon and germanium, 
compared to the Tersoff potential. 

The strict localisation of the atomic energies places a limit on the ac- 
curacy with which the PES can be approximated. If we consider an atom 
whose environment inside Vcnt is fixed, but the position of other atoms are 
allowed to vary, the forces on this atom will still show a variation, depend- 
ing on its environment outside the cutoff. An estimate of this theoretical 
limit can be obtained by calculating the force on an atom inside a fixed 
environment in various configurations. For carbon atoms in the diamond 
structure with rcut = 3.7 A this error estimate is 0.1 eV/A. 



6.2.2 Parameters of GAP 

In diamond, we carried out the GAP training process using different pa- 
rameters to determine the accuracy of the representation. We truncated 
the spherical harmonics expansion in equation |2.58| at Jmax, which there- 
fore represents the resolution of the bispectrum. Employing more spherical 
harmonics coefficients requires more computational resources, partially be- 
cause of the increased number of operations needed for the calculation of 
the bispectrum and partially because there are more invariant elements. 
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Figure 6.6: Force correlation of GAP models for diamond with different 
resolution of representation. The number of invariants were 4, 23 and 69 
for Jinax =1, 3 and 5, respectively. 



which affects the calculation of the covariances in equation 3.20 Figure 6.6 



shows the force error of three different GAP models. The cutoffs of all three 
models were 3.7 A, but the spherical harmonics expansion was truncated at 
the first, the third and the fifth channel, respectively. We chose J^ax = 5 for 
our model, as in this case the standard deviation of the force errors reached 
the theoretical limit of 0.1 eV/A associated with the spatial cutoff. 



Figure |6.7| shows the force errors of three Gaussian Approximation Po- 
tential models for diamond with cutoffs of 2.0 A, 2.75 A and 3.7 A. The 
difference between the latter two models is negligible. However, the elastic 
moduli calculated from the model with rcut = 2.75 A did not match the 
elastic moduli of the ab initio model and so we chose rcut = 3.7 A for our 
final GAP potential. 



6.2.3 Phonon spectra 

The force error correlation is already a good indicator of how well our poten- 
tial fits the original potential energy surface. In addition, we determined the 
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Figure 6.7: Force correlation of GAP models for diamond with different 
spatial cutoffs. 
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Table 6.1: Parameters of the used GAP potentials. 



accuracy of a few other properties. The phonon dispersion curves represent 
the curvature of the potential energy surface around the lowest energy state. 
We calculated the phonon spectrum by the finite difference method using 
GAP. The force-constant matrix of the model was calculated by the numer- 
ical differentiation of the forces, and the phonon spectrum was obtained as 
the eigenvalues of the Fourier-transform of the force-constant matrix. The 



parameters of the GAP potentials are given in table 6A We compared the 
phonon values at a few points in the Brillouin zone with the ab initio values 



and the analytic potentials. These results are shown in figures 6.8, 6.9 and 



6.10 for diamond, silicon and germanium, respectively. The GAP mod- 



els show excellent accuracy at zero temperature over most of the Brillouin 



6.2. GAUSSIAN APPROXIMATION POTENTIALS 



83 




0.25 0.5 0.75 1 0.75 0.5 0.25 0.25 0.5 



Koo] K501 Rsy 



Figure 6.8: Phonon dispersion of diamond calculated by GAP (solid lines), 
the Brenner potential (dotted lines) and LDA-DFT (open squares). 



zone, with a slight deviation for optical modes in the (111) direction. The 
agreement of the phonon spectrum of GAP with the phonon spectrum of 
Density Functional Theory suggests that any quantity that can be derived 
from the vibrational free-energy, such as the constant-volume heat capac- 
ity, at low temperatures will also show good agreement. We found excellent 
agreement between the phonon frequencies calculated by the GAP poten- 
tial for diamond and the dispersion curves measured by inelastic neutron 



scattering [Sni EO], shown in figure 6.11 



We also calculated the elastic constants of our models and these are 
compared to Density Functional Theory and existing interatomic potentials 



in table 6^ We note that to our current knowledge, no existing analytic 
potential could reproduce all of the elastic constants of these materials with 
an error of only a few percents. 



6.2.4 Anharmonic effects 

In order to demonstrate the accuracy of the potential energy surface de- 
scribed by GAP outside the harmonic regime, we calculated the temper- 
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Figure 6.10: Phonon dispersion of germanium calculated by GAP (solid 
lines), the Tersoff potential (dotted lines) and PBE-DFT (open squares). 
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Figure 6.11: Phonon dispersion of diamond calculated by GAP (solid lines) 
and experimental data points[69l [70] (open squares). 



ature dependence of the optical phonon mode of the F point in diamond. 
In fact, the low temperature variation of this quantity has been calculated 
using Density Functional Perturbation Theory by Lang et al.[71j. The ab 
initio calculations show excellent agreement with experimental values deter- 
mined by Liu et al. [72] . We calculated this optical phonon frequency using 
a molecular dynamics approach. We first performed a series of constant- 
pressure molecular dynamics simulations for a 250-atom supercell at dif- 
ferent temperatures in order to determine the equilibrium lattice constant 
as a function of temperature. Then, for each temperature, we used the 
appropriate lattice constant to run a long microcanonical simulation, from 
which we calculated the position-position correlation function. We selected 
the phonon modes by projecting the displacements according to the ap- 
propriate wavevector. From the Fourier-transform of the autocorrelation 
function, we obtained the phonon frequencies by fitting Lorentzians on the 



peaks. We present our results in figure 6.12 where our values for the phonon 
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Table 6.2: Table of elastic constants, in units of GPa. 



frequencies were shifted to match the experimental value at K. 

We note that even at K there are anharmonic effects present due to 
the zero-point motion of the nuclei. We accounted for the quantum nature 
of the nuclei by rescaling the temperature of the molecular dynamics runs, 
by determining the temperature of the quantum system described by the 
same phonon density of states whose energy is equal to the mean kinetic 
energy of the classical molecular dynamics runs. The scaling function for 



the GAP model is shown in figure 6.13 



We are aware that at low temperatures this approximation is rather 
crude, and the correct way of taking the quantum effects into account would 
be solving the Schrodinger equation for the nuclear motion. However, we 
note that the anharmonic correction calculated by Lang et al. by Density 
Functional Perturbation Theory[71J and our value show good agreement. 
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Figure 6.12: Temperature dependence of the optical phonon at the F point 
in diamond. 
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Table 6.3: Anharmonic shift of the F phonon frequency in diamond. 



6.2.5 Thermal expansion of diamond 

Another phenomenon that occurs as a result of the anharmonicity of the 
potential energy surface is thermal expansion. The temperature dependence 
of the thermal expansion coefficient calculated from first principles using 
the quasi-harmonic approximation is remarkably close to the experimental 
value at low temperatures. However, at larger temperatures the quasi- 
harmonic approximation is less valid, because other anharmonic effects, 
which cannot be modelled assuming first-order dependence of the phonon 
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Figure 6.13: Temperature of the quantum system described by GAP whose 
energy is equal to the average kinetic energy of the classical system, as a 
function of the temperature of the classical system. The dotted line is the 
identity function f{x) = x, and is merely shown to provide a guide to the 
eye. 



frequencies on the lattice constant, are more significant. This effect can 
be calculated exactly by solving the nuclear Schrodinger equation for the 
nuclear motion, or by classical molecular dynamics simulation. Herrero and 
Ramirez used a path-integral Monte Carlo method to calculate the thermal 
expansion of diamond modelled by the Tersoff potential[62]. We determined 
the thermal expansion by calculating the equilibrium lattice constant by 
running a series of constant-pressure molecular dynamics simulations at 
different temperatures. We fitted the analytic function 




a(T) = ciT + caT^ + c^iT"^ + cq 



(6.30) 
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Figure 6.14: Temperature dependence of the thermal expansion coefficient. 



to the lattice constants, and then calculated the thermal expansion using 
the definition 

-m = ^(^) . (6.31) 



a(T) \dT^ 

The same analytic function was used by Skinner to obtain the thermal ex- 
pansion coefficient from the experimental lattice constants [73j. Our results 



are shown in figure 6.14, together with the experimental values [73] and val- 
ues calculated by LDA and GAP using the quasiharmonic approach. The 
results obtained by using the Brenner potential is shown in the right panel 



of figure 6.14 It can be seen that the thermal expansion is extremely well 
predicted using GAP in molecular dynamics simulations. 

The GAP results for the thermal expansion coefficients obtained from 
the quasiharmonic approximation show excellent agreement with the LDA 
values. This verifies that the potential energy surface represented by the 
GAP model is, in fact, close to the ab initio potential energy surface, even 
outside the harmonic regime. In the case of Density Functional Theory, 
the molecular dynamics simulation would be computationally expensive. 
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because a large supercell has to be used to minimise finite-size effects. How- 
ever, witli GAP, tliese calculations can be easily performed and the thermal 
expansion coefficients obtained match the experimental values well, even at 
high temperatures. 



6.3 Towards a general carbon potential 

The ultimate aim of our research is to create potentials for general use. 
In the case of carbon, describing the diamond phase is certainly not suffi- 
cient. Although we still have to add many more training configurations to 
complete a general carbon potential, we demonstrate the capabilities of the 
GAP scheme by extending the scope of the diamond potential described in 
the previous section to include graphite, surfaces and vacancies. 

We generated a set of randomised graphite configurations in a similar 
fashion to the diamond training configurations. We randomised the atomic 
positions of the carbon atoms in 54- and 48-atom supercells of rhombohedral 
and hexagonal graphite and we also considered a number of uniaxially com- 
pressed supercells. The training configurations also included diamond con- 
figurations with a vacancy and (111) surfaces, in particular, configurations 
of the unreconstructed (111) surface and the 2x1 Pandey-reconstruction 
were included in the training set. 

We tested how accurately the resulting GAP potential reproduces the 
rhombohedral graphite-diamond transition. Fahy et al. described a simple 
reaction coordinate that transforms the 8-atom unit cell of rhombohedral 
graphite (figure 6.15) to the cubic unit cell of diamond. In figure 6.16 we 
show the energies of the intermediate configurations between rhombohe- 
dral graphite and diamond calculated using GAP, DFT and the Brenner 
potential. The lattice vectors and the atomic coordinates of these 
configurations were generated by 

a, = (1 - x)aS'-"P^'*" + a;a:^™^ where a = 1, 2, 3 (6.32) 
= (1 - a;)rf * + xrf ™^ where z = 1, . . . , 8. (6.33) 
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Figure 6.15: Rhombohedral graphite. 

The reaction coordinate x corresponds to graphite at x = and to diamond 
at X = 1. It can be seen that the Brenner potential cannot describe the 
change in the bonding of the carbon atoms, whereas the GAP potential 
reproduces the quantum mechanical barrier accurately. 

We also calculated the energetics of the vacancy migration in a similar 
fashion, i.e. along a linear path between two configurations, where the 
vacancies are at two neighbouring lattice sites. Our results are shown in 



figure 6.17 The GAP model predicts the same the energies as the Density 
Functional Theory, whereas the Brenner potential overestimates the energy 
barrier of the migration. 

Our results for the surface energies of the diamond (111) surface are 



presented in table 6.4 again showing very good agreement between GAP 



predictions and LDA results. 
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Figure 6.16: The energetics of the hnear transition path from rhombohedral 
graphite to diamond calculated by DFT, GAP and the Brenner potential. 




Reaction coordinate 



Figure 6.17: Energy along a vacancy migration path in diamond by DFT, 
GAP and the Brenner potential. 
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4.23 


4.40 
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4.77 



Table 6.4: Surface energies in the units of J/m^ of the unreconstructed and 
2x1 Pandey-reconstructed surface of the (111) diamond surface. 



6.4 Gaussian Approximation Potential for 
iron 

The Gaussian Approximation Potential scheme is not limited to simple 
semiconductors. We demonstrate this by applying the scheme to a metallic 
system, namely the body-centred cubic (bcc) phase of iron. We included 
configurations in the training set where the lattice vectors of the 1-atom 
primitive cell were randomised and where the positions of the atoms in 8 
and 16-atom supercells were also randomised. These configurations were 
represented by 50 sparse points in the training set for the GAP potential. 
The spatial cutoff for the GAP potential was 4.0 A and we used the spherical 
harmonics coefficients for the bispectrum up to Jmax = 6. 

We checked the accuracy of our potential by calculating the phonon 
spectrum along the high symmetry directions and comparing the phonon 
frequencies at a few k-points with Density Functional Theory. These spec- 
tra, together with those generated by the Finnis- Sinclair potential are shown 



in figure 6.18[ In figure 6.19 we compared the phonon frequencies calculated 
by the GAP potential to the experimental values obtained by the neutron- 
inelastic-scattering technique |74] . The main features of the phonon disper- 
sion relation, for example, the crossing of the two branches along the ^, ^] 
direction, are reproduced by the GAP potential. The errors in the frequen- 
cies can be attributed to our Density Functional Theory calculations. 

The elastic moduli calculated with our model, the Finnis-Sinclair potential |75j 



and Density Functional Theory are given in table 6.5 The elastic properties 



and the phonon dispersion relations described by the GAP model show ex- 
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Figure 6.18: Phonon spectrum of iron using the GAP potential (solid lines), 
the Finnis-Sinclair potential (dotted lines) and PBE-DFT (open squares). 




Figure 6.19: Phonon dispersion of iron using the GAP potential (solid 
lines) and experimental values (open squares) [71] . 



6.5. GAUSSIAN APPROXIMATION POTENTIAL FOR GALLIUM 
NITRIDE 95 
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Table 6.5: Elastic moduli of iron in units of GPa calculated using different 
models. 

cellent agreement with the values calculated by Density Functional Theory. 

6.5 Gaussian Approximation Potential for 
gallium nitride 

So far our tests of the Gaussian Approximation Potentials were limited to 
single-species systems, but the framework can be extended to multispecies 
systems. Here we report our first attempt to model such a system, the 
cubic phase of gallium nitride. Gallium nitride (GaN) is a two-component 
semiconductor with a wurtzite or zinc-blende structure. There is a charge 
transfer between the two species. 

As in our previous work, the configurations for fitting the GAP model 
were generated by randomising the lattice vectors of the primitive cell and 
randomly displacing atoms in larger supercells. Owing to the charge trans- 
fer, we need to include the long-range Coulomb-interaction in our model. 
We decided to use the charges obtained from the population analyses of 
the ground state electronic structure of a number of atomic configurations. 
Due to the fact that these configurations are similar, the fluctuation of the 
atomic charges was not significant, hence we chose to use a simple, fixed 
charge model with — 1 e charge on the nitrogen atoms and 1 e charge on the 
gallium atoms. We calculated the electrostatic forces and energies for each 
training configuration by the standard Ewald-technique [18] and subtracted 
these from the forces and energies obtained from the Density Functional 
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DFT force / eV/A 



Figure 6.20: Force components in GaN predicted by GAP vs. DFT forces. 
The diagonal line is the f{x) = x function, which represents the perfect 
correlation. The inset depicts the distribution of the difference between the 
force components. 



Theory calculations. We regarded the remaining forces and energies as the 
short-range contribution of the atomic energies, and these were used for 
the regression to determine the GAP potential. The cutoff of the GAP 
potential was chosen to be 3.5 A, Jmax = 5 and we sparsified the training 
configurations using 300 sparse points. 

We checked the correlation of the predicted forces of the resulting GAP 



potential with the ab initio forces, and the results are shown in figure 6.20 



We used 64-atom configurations where the atoms were randomly displaced 
by similar amounts to the training configurations. The phonon spectrum 



calculated by GAP is shown in figure 6.21 and the elastic moduli are listed 
in table 16.61 

Even this simple GAP model for gallium nitride shows remarkable ac- 
curacy in these tests, which we take as evidence that we can adapt GAP to 
multispecies systems. However, in the case of very different neighbourhood 
configurations we will probably have to include variable charges, and we 
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Figure 6.21: Phonon spectrum of GaN, calculated by GAP (solid lines) 
and PBE-DFT (open squares). 
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Table 6.6: Elastic moduli of GaN in units of GPa calculated using PBE- 
DFT and GAP. 



will possibly have to consider the contributions of multipole interactions in 
the long-range part of the potential. This is the subject of future research. 



6.6 Atomic energies from GAP 



In section [6.1.2| we investigated a possible definition of atomic energies based 
on localised atomic basis sets. According to our results in section [6. 1.5 , how- 
ever, those atomic energies could not be used in our potential generation 
scheme because they showed a large variation between numerically identical 
local environments. Instead, we employed some extensions of the Gaussian 
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Process regression method — learning from derivatives, use of linear com- 
bination of function values and sparsification — , which make the explicit 
definition of atomic energies unnecessary. Nonetheless, we found it striking 
that an alternative possible definition of the quantum mechanical atomic 
energies, i.e. the ones inferred by the Gaussian Approximation Potentials 
appeared to be successful. In other words, using these atomic energies we 
can obtain the most commensurate forces and total energies for a given 
spatial cutoff, therefore these atomic energies are optimal in this sense. 

We show two examples which demonstrate that the atomic energies 
predicted by GAP are consistent with physical considerations. In the first 
application, we calculated the atomic energies of the atoms in a 96-atom 
slab of diamond, which had two (111) surfaces. The training configura- 
tions were generated by scaling the lattice vectors and positions of the 
atoms of the minimised configuration by a constant factor and randomising 
the atomic positions, and each of these steps was started from a previous 
one. This means that in 20 steps, we created a series of samples between 
the minimised structure and a completely randomised, gas-like configura- 
tion. We calculated the total energy and the forces of the configurations by 
DFTB[76], and used these to train a GAP model. The cutoff of the model 
was 2.75 A and the atomic environments were represented by 100 sparse 
teaching points. We used this model only to determine the atomic energies 
in the original slab. The atomic energies of the carbon atoms as a function 



of their distance from the surface are plotted in figure 6.22 It can been seen 
that the atomic energy is higher at the surface and then gradually reaches 
the bulk value towards the middle of the slab. 

We also calculated the atomic energies defined by GAP in a gallium- 
nitride crystal where permutational defects were present. We created two 
configurations which contained such defects. The first one was generated 
by swapping the positions of a gallium and nitrogen atom in a 96-atom 
wurtize-type supercell, and then we swapped the positions of another pair 
to generate the second configuration. We calculated the total energies and 
forces of the two configurations by Density Functional Theory and used this 
data to train a very simple GAP potential. The cutoff of the model was 
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Figure 6.22: Atomic energies of carbon atoms in a slab of diamond with two 
(111) surfaces as a function of the distance of the atom from the surface. 



3.5 A and we used six sparse point to represent the atomic environments. 
We used this model to calculate the atomic energies in the same two con- 
figurations. Certainly, the resulting potential is not a good representation 
of the quantum mechanical potential energy surface, but it still detects the 
defects and predicts higher atomic energies for the misplaced atoms. 



Figure 6.23 shows the configurations with the defects and the perfect lat- 
tice. The colouring of atoms represent their atomic energies. It can be seen 
that the atomic energies of the atoms forming the defect and surrounding 
it are higher. 

In random structure search applications |77j GAP can be directly em- 
ployed to detect permutational defects. If there are more than one species 
present in the structure, the structure search can result in many similar 
lattices, none of which are perfect, because of the large number of permu- 
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Figure 6.23: The atomic energies in gallium-nitride crystals. We show the 
perfect wurtzite structure on the left, a crystal containing a single defect 
(a gallium and a nitrogen atom swapped) is on the middle and a crystal 
containing two defects (a further gallium-nitrogen pair swapped) is on the 
right. The smaller spheres represent the nitrogen atoms, the larger ones 
represent the gallium atoms. The coloured bar on the right shows the 
energy associated with the colour shades, in eV. 



tations of different species. GAP models, which are generated on the fly, 
can be used to suggest swaps of atoms between the local minima already 
found, which can then result in lower energy structure. Using GAP as an 
auxiliary tool in such structure searches can possibly achieve a significant 
speedup in searching for the global energy minimum. 

6.7 Performance of Gaussian 
Approximation Potentials 

The total computational cost of Gaussian Approximation Potentials con- 
sists of two terms. The first term, which is a fixed cost, includes the compu- 
tation of the ab initio forces and energies of the reference calculations and 
the generation of the potential. The time required to generate the poten- 
tial scales linearly with the number of atomic environments in the reference 
configurations and the number of sparse configurations. In our applications, 
performing the DFT calculations typically took 100 CPU hours while the 
generation of a GAP potential was about a CPU hour. 

Even for small systems, GAP potentials in our current implementa- 
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tion are order of magnitudes faster than Density Functional Theory, but 
significantly — about a hundred times — more expensive than analytical po- 
tentials. Calculation of the energies and forces requires about 0.01 s for 
every atom on a single CPU core. For comparison, a timestep of a 216-atom 
simulation cell takes about 190 s per atom on a single core by CASTEP, 
which corresponds to 20,000-fold speedup. The same calculation for iron 
would take a miUion times longer by CASTEP. 



7 Conclusion and further work 



During my doctoral studies, I implemented a novel, general approach to 
building interatomic potentials, which we call Gaussian Approximation Po- 
tentials. Our potentials are designed to reproduce the quantum mechanical 
potential energy surface (PES) as closely as possible, while being signifi- 
cantly faster than quantum mechanical methods. To achieve this, we used 
the concept of Gaussian Process from Inference Theory and the bispectral 
representation of atomic enviroments, which we derived and adapted using 
the Group Theory of rotational groups. 

I tested the GAP models on a range of simple materials, based on data 
obtained from Density Functional Theory. I built interatomic potentials for 
the diamond lattices of the group IV semiconductors and I performed rig- 
orous tests to evaluate the accuracy of the potential energy surface. These 
tests showed that the GAP models reproduce the quantum mechanical re- 
sults in the harmonic regime, i.e. phonon spectra, elastic properties very 
well. In the case of diamond, I calculated properties which are determined 
by the anharmonic nature of the PES, such as the temperature dependence 
of the optical phonon frequency at the P point and the temperature depen- 
dence of the thermal expansion coefficient. Our GAP potential reproduced 
the values given by Denstity Functional Theory and experiments. 

These potentials constituted our initial tests of the scheme, and rep- 
resented only a small part of the PES. In the case of carbon, I extended 
the GAP model to describe graphite, the diamond (111) surface and vacan- 
cies in the diamond lattice. I found that the new GAP potential described 
the rhombohedral graphite-diamond transition, the surface energies and the 
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vacancy migration remarkably well. 

To show that our scheme is not limited to describing monoatomic semi- 
conductors, I generated a potential for bcc iron, a metal, and for gallium 
nitride, an ionic semiconductor. Our preliminary tests, which were the 
comparison of the phonon dispersion and the elastic moduli with Density 
Functional Theory values, demonstrate that GAP models can easily be built 
for different kinds of materials. I also suggest that the Gaussian Approxi- 
mation Potentials can be generated on the fly and used as auxiliary tools 
for example, in structure search applications. 

7.1 Further work 

In my thesis I presented preliminary tests and validation of our potential 
generation scheme. In the future, we intend to build models and perform 
large scale simulations on a wide range of materials. The first step will be to 
create a general carbon potential, which can describe amorphous and liquid 
carbon at a wide range of pressures and temperatures as well as defects and 
surfaces. We are also planning to create "disposable" potentials, which can 
be used, for instance, in the case of crack simulations. These do not have 
to be able to describe the high-temperature behaviour of the materials, 
as only a restricted part of the configurational space is accessible under 
the conditions of the simulation. The description of electrostatics will be 
soon implemented, with charges and polarisabilities which depend on the 
local environment and the electric field. This will allow us to simulate 
more complex systems, for example silica or water and our ultimate aim 
is to build interatomic potentials — force fields — for biological compounds. 
None of these potentials have to be based on Density Functional Theory, 
for instance it might be necessary to use more accurate solutions of the 
electronic Schrodinger equation. Finally, using GAP as a post-processing 
tool to determine atomic energies derived from on Quantum Mechanics is 
also a future direction of our research, for example, in structure searches. 
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A Woodbury matrix identity 



The likelihood function in equation |3.46| is used during the sparsification 
procedure in order to optimise the hyperparameters and the sparse points. 
At first sight, it seems that the inverse of an x matrix has to be 
calculated, the computational cost of which would scale as A^^. However, 
by using the matrix inversion lemma, also known as the Woodbury matrix 
identity, the computational cost scales only with NM"^ if N » M. If 
we want to find the inverse of a matrix, which can be written in the form 
Z + UWV"^, the Woodbury matrix identity states that 

(z + uwv'^)-^ = Z"^ - z-^u(w-^ + v^z-^u)-^v'^z-\ (A.l) 

In our case, Z is an A^ x A^ diagonal matrix, hence its inverse is trivial, and 
is M X M. The order of the operations can arranged such that none 
of them requires more than NM^ fioating point operations: 

t"^(CArA/Cj^/CA/Ar + T) "^t = 

t^T "^t — (t"^T ^)CNMiCj^,j + Cmn"^ ^Cnm) ^CmnC^ "^t), (A. 2) 



where T = A + a^I. In the evaluation of the second term in equation 3.46 
we used the matrix determinant lemma, which is analogous to the inversion 
formula: 

det(Z + UWV^) = det(W"^ + V^Z^^U) det(W) det(Z). (A.3) 

In our implementation, the determinants are calculated together with the 
inverses, without any computational overhead. 
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We also note that at certain values of the hyperparameters the matrix 
Cm is ill conditioned. In the original Gaussian Process, the covariance 
matrix Q can also be ill conditioned, but by adding the diagonal matrix a^I 
this problem is eliminated, except for very small values of the ai, parameters. 
Snelson suggested [78 j that a small diagonal matrix ,^^1 should be added to 
Cm to improve the condition number of the matrix. This small "jitter" 
factor can be regarded as the internal error of the sparsification. 



B Spherical harmonics 



B.l Four-dimensional spherical harmonics 

The spherical harmonics in three dimensions are the angular part of the 
solution of the Laplace equation 

92 92 92 \ 



dx^ dy"^ dz^ 

This concept can be generalised to higher dimensions. In our case, we need 
the solutions of the four dimensional Laplace equation 

Q2 Q2 Q2 Q2 \ 



dx^ dy"^ dz^ dzl 

which can be written in the form of the three-dimensional rotation matrices, 
the Wigner D-f unctions. 

The definition of the elements of the rotational matrices is 



D^L'iR) = {yim\mm'), (B.3) 



where the rotation R is defined by three rotational angles. The rotational 
operator is usually described as three successive rotations 

• rotation about the z axis by angle a, 

• rotation about the new y' axis by angle /3, 

• rotation about the new z' axis by angle 7, 
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where a, P and 7 are called the Euler-angles. The Wigner D-functions 
are usually formulated as the function of these three angles and denoted 
as D^J^^/(Q;, ^, 7). However, in some cases the rotation can be described 
more conveniently in terms of ou, 9 and 0, where the rotation is treated as 
a single rotation through angle u about the axis n{9,(f)). The vector n is 
determined by the polar angles 9 and (f). 

The rotational matrices in the form U^j^,{u!, 9,(f)), where the four dimen- 
sional polar angles are 29o = ou, 9 and are the four dimensional spherical 
harmonics. 

The matrix elements can be constructed as 



^ ^y{J+My.{J-My.{J+M'y.(J-M'y. , _ 

s\(s+M+M'y{J-M-sy.{J-M'-sy\ 

M + M' > 



E 



^y{J+My.{J-My.{J+M'y.{J-M'y , _ 

s s\(s-M-M'y{J+M-sy(J+M'-sy \ 

M + M' < 



X 



-2\s 



(B.4) 



where 



V = sin 6q sin 6 

u = cos 9q — i sin 6*0 cos 9. 



(B.5) 
(B.6) 



In our application, each time an entire set of U'l^j^j, has to be calculated, 
thus the use of recursion relation is computationally more efficient. The 
recursion relations are 



^MM'(^0, 9, (f)) — 



J + M 



J-M' 
for M' J 



-ve 



'K~AM^^i(^o,u:,9) 



(B.7) 
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and 

for M' ^ -J 

The actual implementation does not involve the explicit calculation of the 
polar angles, we calculate the spherical harmonics in term of the Carte- 
sian coordinates x, y, z and zq. The first two four-dimensional spherical 
harmonics are 

C/°o = 1 (B.9) 

and 

±2±2 V2 r 
Uli 1 = B.ll 

which are indeed analogous to their three-dimensional counterparts. 

B.2 Clebsch-Gordan coefficients 

We used the following formula to compute the Clebsch-Gordan coefficients: 



X 



E 



X ^J{a + a)!(a - + /3)!(6 - /3)!(c + 7)!(c - 7)!(2c + 1) 

(-1)^ 



2;!(a -F6-c-2;)!(a-Q!-2;)!(6-F^-2;)!(c-6-FQ;-F2;)!(c-a-^-Fz)!' 

(B.12) 

where A-symbol is 



A(a6c) = ,/ ''' + ^-f''-^ + '^)''-° + ^ + '^)' . (B.13) 
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